Write Explicit Tests

January 14, 2017

Where I work, we have an in-house Python course that includes a few exercises. I’ve been checking the same exercise solution for a few years now, for every new programmer in our group. It’s interesting to see how a lot of programmers new to Python solve the very same problem. I get to see what each of them does differently and what they all do the same.

In this exercise, we implement and test a Path class that represents a file system path. It will be somewhat similar to the pathlib API, but simplified. The Path class is initialized with a path and provides an OOPy interface, such as nice printing with both str and repr, iteration over directory contents, etc.

For this example, let’s look at the __repr__ functionality. The exercise calls for it to behave like this:

>>> path = Path('/path/to/file')
>>> repr(path)
Path('/path/to/file')

Invariably, I get for review an implementation of this feature and one version or another of this test, verifying it behaves properly:

class PathTest(unittest.TestCase):
    ROOT_DIR = '/path/to/file'

    def setUp(self):
        self.path = Path(self.ROOT_DIR)

    def test_repr(self):
        expected = "Path({})".format(self.ROOT_DIR)
        self.assertEquals(expected, repr(self.path))

On its face, this seems reasonable. Since we create a Path object in almost all other test methods, it make sense to create this object in the setUp method. Since we need the path we created the Path object for, it makes sense to put it in a constant, hence ROOT_DIR. Since we have ROOT_DIR, it makes sense to create the expected repr string dynamically.

Except, I think this test code is bad. It’s bad because you need to look at three different code blocks to understand what’s going on. It’s bad because you can’t clearly see the expected result in a glimpse. It’s bad because you’re building the repr string in the same way you do in the actual implementation, which can perpetuate bugs from the implementation to the tests.

Since I see this over and over, I can tell you that usually, the repr implementation does not meet the required behavior. In fact, the test code above has a bug - can you find it ¹?

I propose the following test implementation instead:

class PathTest(unittest.TestCase):
    def test_repr(self):
        self.assertEquals(repr(Path('/path/to/file')),
                          "Path('/path/to/file')")

It’s short and to the point. The entire test is in these two lines (and only one logical line). There’s nothing else. We don’t even define any variables. The object we create, along with its explicit initialization data, is physically close to the repr result we are testing. It is practically impossible to get the repr string wrong since it is right there, in full form, on the screen.

An interesting phenomenon I have observed is that it is usually pretty difficult to convince young programmers to adopt this attitude. When I suggest my version of the test to the reviewees, it is usually met with resistance. A common argument is that my version has code duplication. Let me try and rebut this argument.

Code duplication is bad. It’s bad because if you have two pieces of identical code segments and you change one of them, but forget to change the other, you’ll probably get a pretty nasty bug. This is why programmers believe in DRY - Don’t Repeat Yourself. This usually translates to a rule-of-thumb which says that “if you use the same literal twice, put it in a constant”. While this is usually good advice, a rule-of-thumb always has exceptions.

In my version of the test, the duplication is with the path string, '/path/to/file'. What’s going to happen if we change the first occurrence, but not the second? Well, the test will fail. It will fail fast, it will be easily detectable and we’ll solve it in under 3 seconds, flat.

What will happen if we use this path all over our tests, to build many Path objects? Well, nothing. If we accidentally change it in only one test, that test might fail, but it has no actual meaning across tests. The thing is, while this string appears in many tests, it is an arbitrary string, which is coincidentally the same. We gain nothing from refactoring it out into a constant, while losing readability.

Look at the way the repr string is built - the path is printed using str and not repr, thus it will be printed without quotes, e.g., Path(/path/to/file) instead of Path('/path/to/file'). ↩︎

Discuss this post at the comment section below.
Follow me on Twitter and Facebook
Thanks to Hannan Aharonov, Yonatan Nakar, Ram Rachum, Shachar Ohana and Shachar Nudler for reading drafts of this.

Write Explicit Tests

Similar Posts