Python Tips: Iterate with a Sentinel Value

This post is a few years old now, so some details (or my opinions) might be out of date.
I would still love to hear your feedback in the comments below. Enjoy!

TL;DR 1

When f is a file, a socket, or anything you read from until you get an empty string (or another set value), you can use the iter function in a for loop to loop until that value is returned:

blocks = []
read_block = partial(f.read, 32)
for block in iter(read_block, ''):
    blocks.append(block)

The Problem

If you ever had to write code that uses sockets, reads blocks of bytes from a file or many other I/O-read loops, you probably recognize the structure of the following snippet:

blocks = []
while True:
    block = f.read(32)
    if block == '':
        break
    blocks.append(block)

This boilerplate code is very common, very ugly and - as you’ll find out in a minute - very avoidable. First let’s understand what were doing here, in plain language:

  1. Do forever:
    1.1. Read a block from f.
    1.2. If the value was '', break from the loop.
    1.3. Do something with the read value.

Why is this bad? There are two reasons:

  1. Usually, when we iterate over an objects or until a condition happens, we understand the scope of the loop in its first line. e.g., when reading a loop that starts with for book in books we realize we’re iterating over all the books. When we see a loop that starts with while not battery.empty() we realize that the scope of the loop is for as long as we still have battery.
    When we say “Do forever” (i.e., while True), it’s obvious that this scope is a lie. So it requires us to hold that thought in our head and search the rest of the code for a statement that’ll get us out of it. We are entering the loop with less information and so it is less readable.

  2. We are essentialy iterating over chunks of bytes. Out of the 4 lines in the loop, only one line refers to those bytes. So that’s a bad signal-to-noise ration, which also affects readability. For a reader unfamiliar with this code-form, it’s not clear that if block == '' is a technical, implemetation-driver detail. It might seem like an semantic value returned from the read.

The Solution

You might recall there’s a function called iter. It can accept an argument that supports iteration and returns an iterator for it. Using it like that, it seems pretty useless, as you just iterate over that collection without iter. But it also accepts another argument - a sentinel value:

In computer programming, a sentinel value [..] is a special value whose presence guarantees termination of a loop that processes structured (especially sequential) data. The sentinel value makes it possible to detect the end of the data when no other means to do so (such as an explicit size indication) is provided. The value should be selected in such a way that it is guaranteed to be distinct from all legal data values, since otherwise the presence of such values would prematurely signal the end of the data.

The sentinel value in this case is an empty string - since any successful read from an I/O device will return a non-empty string, it is guaranteed that no successful read will return this value.

When a sentinel value is supplied to iter, it will still return an iterator, but it will interpret its first argument differently - it will assume it’s callable (without arguments) and will call it repeatedly until it returns the sentinel value. Afterwards, the iterator would stop.

The trouble is that usually read functions do take an argument - usually the size to read (in bytes, lines, etc.), so we need to create a new function which takes no input and reads a constant size. We have two main tools for the job: partial (imported from functools) and lambda (a built-in keyword). The two following lines are equivalent 2:

read = partial(f.read, 32)
read = lambda: f.read(32)

partial is specifically designed to take functions that accept arguments and create “smaller” functions with some of those arguments set as constants. lambda is a little bit more flexible, but can also be useful for the same thing.

The only thing left is to tie all this together:

blocks = []
read_block = partial(f.read, 32)
for block in iter(read_block, ''):
    blocks.append(block)

If you’re not doing anything except appending the blocks together, you can even make it shorter:

blocks = ''.join(iter(partial(f.read, 32), ''))

Remember: Readability counts!

  1. I learned this Python tip from Raymond Hettinger’s excellent talk “Transforming code into Beautiful, Idiomatic Python”. I use his examples as well, and you should really just watch the talk instead of reading this. I’m putting this out there for two reasons: one - because writing about something helps me remember it, and two - because text is a more searchable and skimmable than video. ↩︎

  2. There are some minor differences between the two generated functions. Alon Horev’s blog post on the subject is a very interesting read. ↩︎

Discuss this post at the comment section below.
Follow me on Twitter and Facebook

Similar Posts