Write Explicit Tests
Where I work, we have an in-house Python course that includes a few exercises. I’ve been checking the same exercise solution for a few years now, for every new programmer in our group. It’s interesting to see how a lot of programmers new to Python solve the very same problem. I get to see what each of them does differently and what they all do the same.
In this exercise, we implement and test a
Path class that represents a file system path. It will be somewhat similar to the
pathlib API, but simplified. The
Path class is initialized with a path and provides an OOPy interface, such as nice printing with both
repr, iteration over directory contents, etc.
For this example, let’s look at the
__repr__ functionality. The exercise calls for it to behave like this:
Invariably, I get for review an implementation of this feature and one version or another of this test, verifying it behaves properly:
On its face, this seems reasonable. Since we create a
Path object in almost all other test methods, it make sense to create this object in the
setUp method. Since we need the path we created the
Path object for, it makes sense to put it in a constant, hence
ROOT_DIR. Since we have
ROOT_DIR, it makes sense to create the expected
repr string dynamically.
Except, I think this test code is bad. It’s bad because you need to look at three different code blocks to understand what’s going on. It’s bad because you can’t clearly see the expected result in a glimpse. It’s bad because you’re building the
repr string in the same way you do in the actual implementation, which can perpetuate bugs from the implementation to the tests.
Since I see this over and over, I can tell you that usually, the
repr implementation does not meet the required behavior. In fact, the test code above has a bug - can you find it 1?
I propose the following test implementation instead:
It’s short and to the point. The entire test is in these two lines (and only one logical line). There’s nothing else. We don’t even define any variables. The object we create, along with its explicit initialization data, is physically close to the
repr result we are testing. It is practically impossible to get the
repr string wrong since it is right there, in full form, on the screen.
An interesting phenomenon I have observed is that it is usually pretty difficult to convince young programmers to adopt this attitude. When I suggest my version of the test to the reviewees, it is usually met with resistance. A common argument is that my version has code duplication. Let me try and rebut this argument.
Code duplication is bad. It’s bad because if you have two pieces of identical code segments and you change one of them, but forget to change the other, you’ll probably get a pretty nasty bug. This is why programmers believe in DRY - Don’t Repeat Yourself. This usually translates to a rule-of-thumb which says that “if you use the same literal twice, put it in a constant”. While this is usually good advice, a rule-of-thumb always has exceptions.
In my version of the test, the duplication is with the path string,
'/path/to/file'. What’s going to happen if we change the first occurrence, but not the second? Well, the test will fail. It will fail fast, it will be easily detectable and we’ll solve it in under 3 seconds, flat.
What will happen if we use this path all over our tests, to build many
Path objects? Well, nothing. If we accidentally change it in only one test, that test might fail, but it has no actual meaning across tests. The thing is, while this string appears in many tests, it is an arbitrary string, which is coincidentally the same. We gain nothing from refactoring it out into a constant, while losing readability.
Look at the way the
reprstring is built - the path is printed using
repr, thus it will be printed without quotes, e.g.,
Follow me on Twitter, Facebook or Google+
Thanks to Hannan Aharonov, Yonatan Nakar, Ram Rachum, Shachar Ohana and Shachar Nudler for reading drafts of this.