Django QuerySets: Fucking Awesome? Yes
I would still love to hear your feedback in the comments below. Enjoy!
Django QuerySets
are pretty awesome.
In this post I’ll explain a bit about what they are and how they work (if you’re already familiar with them, you can jump to the second part), I’ll argue that you should always return a QuerySet
object if it’s possible and I’ll talk about how to do just that.
QuerySets Are Awesome
A QuerySet
, in essence, is a list of objects of a given model. I say ‘list’ and not ‘group’ or the more formal ‘set’ because it is ordered. In fact, you’re probably already familiar with how to get QuerySets
because that’s what you get when you call various Book.objects.XXX()
methods. For example, consider the following statement:
What all()
returns is a QuerySet
of Book
instances which happens to include all Book instances that exist. There are other calls which you probably already know:
The cool thing about QuerySet
s is that, since every one of these function both operates on and returns a QuerySet
, you can chain them up:
And that’s not all! It’s also fast:
Internally, a
QuerySet
can be constructed, filtered, sliced, and generally passed around without actually hitting the database. No database activity actually occurs until you do something to evaluate the queryset.
So we’ve established that QuerySets
are cool. Now what?
Return QuerySets Wherever Possible
I’ve recently worked on a django app where I had a Model that represented a tree (the data structure, not the christmas decoration). It meant that every instance had a link to its parent in the tree. It looked something like this:
This worked pretty well. Trouble was, I had to add another method, get_larger_ancestors, which should return all the ancestors whose value was larger then the value of the current node. This is how I could have implemented this:
The problem with this is that I’m essentially going over the list twice - one time by django and another time by me. It got me thinking - what if get_ancestors
returned a QuerySet
instead of a list? I could have done this:
Pretty straight forward, The important thing here is that I’m not looping over the objects. I could perform however many filters I want on what get_larger_ancestors
returned and feel safe that I’m not rerunning on a list of object of an unknown size. The key advantage here is that I keep using the same interface for querying. When the user gets a bunch of objects, we don’t know how he’ll want to slice and dice them. When we return QuerySet
objects we guarantee that the user will know how to handle it.
But how do I implement get_ancestors
to return a QuerySet
? That’s a little bit trickier. It’s not possible to collect the data we want with a single query, nor is it possible with any pre-determined number of queries. The nature of what we’re looking for is dynamic and the alternative implementation will look pretty similar to what it is now. Here’s the alternative, better implementation:
Take a while, soak it in. I’ll go over the specifics in just a minute.
The point I’m trying to make here is that whenever you return a bunch of objects - you should always try to return a QuerySet instead. Doing so will allow the user to freely filter, splice and order the result in a way that’s easy, familiar and provides better performance.
(On a side note - I am hitting the database in get_ancestors, since I’m using self.parent recursively. There is an extra hit on the database here - once when executing the function and another in the future, when actually inspecting the results. We do get the performance upside when we perform further fliters on the results which would have meant more hits on the database or heavy in-memory operations. The example here is to show how to turn non-trivial operations into QuerySets).
Common QuerySet
Manipulations
So, returning a QuerySet
where we perform a simple query is easy. When we want to implement something with a little more zazz, we need to perform relational operations (and some helpers, too). Here’s a handy cheat sheet (as an exercise, try to understand my implementation of get_larger_ancestors
).
Union - The union operator for
QuerySet
s is|
, the pipe symbol.qs1 | qs2
returns aQuerySet
with all the items fromqs1
and all the items inqs2
while handling duplicates (items that are in bothQuerySet
s will only appear once in the result).Intersection - there is no special operator for intersection, because you already know how to do it! Chaining functions like
filter
andexclude
are in fact performing an intersection between the originalQuerySet
and the new filter.Difference - a difference (mathematically written as
qs1 \ qs2
) is all the items inqs1
that do not exist inqs2
. Note that this operation is asymmetrical (as opposed to the previous operations). I’m afraid there is no built-in way to do this in python, but you can do this:qs1.exclude(pk__in=qs2)
Nothing - seems useless, but it actually isn’t, as the above example shows. A lot of time, when you’re dynamically building a
QuerySet
with unions, you need to start off with what would have been an empty list. This is how to get it:MyModel.objects.none()
.
Follow me on Twitter and Facebook