Write Explicit Tests
Where I work, we have an in-house Python course that includes a few exercises. I’ve been checking the same exercise solution for a few years now, for every new programmer in our group. It’s interesting to see how a lot of programmers new to Python solve the very same problem. I get to see what each of them does differently and what they all do the same.
In this exercise, we implement and test a Path
class that represents a file system path. It will be somewhat similar to the pathlib
API, but simplified. The Path
class is initialized with a path and provides an OOPy interface, such as nice printing with both str
and repr
, iteration over directory contents, etc.
For this example, let’s look at the __repr__
functionality. The exercise calls for it to behave like this:
Invariably, I get for review an implementation of this feature and one version or another of this test, verifying it behaves properly:
On its face, this seems reasonable. Since we create a Path
object in almost all other test methods, it make sense to create this object in the setUp
method. Since we need the path we created the Path
object for, it makes sense to put it in a constant, hence ROOT_DIR
. Since we have ROOT_DIR
, it makes sense to create the expected repr
string dynamically.
Except, I think this test code is bad. It’s bad because you need to look at three different code blocks to understand what’s going on. It’s bad because you can’t clearly see the expected result in a glimpse. It’s bad because you’re building the repr
string in the same way you do in the actual implementation, which can perpetuate bugs from the implementation to the tests.
Since I see this over and over, I can tell you that usually, the repr
implementation does not meet the required behavior. In fact, the test code above has a bug - can you find it 1?
I propose the following test implementation instead:
It’s short and to the point. The entire test is in these two lines (and only one logical line). There’s nothing else. We don’t even define any variables. The object we create, along with its explicit initialization data, is physically close to the repr
result we are testing. It is practically impossible to get the repr
string wrong since it is right there, in full form, on the screen.
An interesting phenomenon I have observed is that it is usually pretty difficult to convince young programmers to adopt this attitude. When I suggest my version of the test to the reviewees, it is usually met with resistance. A common argument is that my version has code duplication. Let me try and rebut this argument.
Code duplication is bad. It’s bad because if you have two pieces of identical code segments and you change one of them, but forget to change the other, you’ll probably get a pretty nasty bug. This is why programmers believe in DRY - Don’t Repeat Yourself. This usually translates to a rule-of-thumb which says that “if you use the same literal twice, put it in a constant”. While this is usually good advice, a rule-of-thumb always has exceptions.
In my version of the test, the duplication is with the path string, '/path/to/file'
. What’s going to happen if we change the first occurrence, but not the second? Well, the test will fail. It will fail fast, it will be easily detectable and we’ll solve it in under 3 seconds, flat.
What will happen if we use this path all over our tests, to build many Path
objects? Well, nothing. If we accidentally change it in only one test, that test might fail, but it has no actual meaning across tests. The thing is, while this string appears in many tests, it is an arbitrary string, which is coincidentally the same. We gain nothing from refactoring it out into a constant, while losing readability.
Look at the way the
repr
string is built - the path is printed usingstr
and notrepr
, thus it will be printed without quotes, e.g.,Path(/path/to/file)
instead ofPath('/path/to/file')
. ↩︎
Follow me on Twitter and Facebook
Thanks to Hannan Aharonov, Yonatan Nakar, Ram Rachum, Shachar Ohana and Shachar Nudler for reading drafts of this.