Python Attribute Access and the Descriptor Protocol
Let’s look at the following snippet:
We already know how Foo
instantiation works. Today, our question is this:
What exactly happens when we say foo.bar
?
You might already know that most Python classes have an internal dictionary called __dict__
which holds all of their internal variables. And what’s amazing about Python is that we can simply inspect even internal implementation details like this one:
So we can arrive at the following incomplete hypothesis:
foo.bar
is equivalent tofoo.__dict__['bar']
.
It looks correct:
Now, suppose you’re a sophisticated fancy-pants modern Pythonista and you know you can define dynamic attributes in Python classes, how does that sit with our knowledge about __dict__
?
Err… okay. We can see that __getattr__
can “fake” attribute access, but it doesn’t work if we already have that variable defined (meaning, foo.bar
returns 'hello!'
and not 'goodbye!'
). So this mechanism is more complex than it seemed, and there’s actual logic involved when accessing attributes. Indeed, there’s a magic method that’s called whenever we access instance attributes, but it’s clearly not __getattr__
as we can see in the above example. This magic method is called __getattribute__
and we’ll try to reverse engineer it by observing its different behaviors. For now, let’s modify our hypothesis:
foo.bar
is equivalent to callingfoo.__getattribute__('bar')
, which is roughly:
Let’s test this out by actually implementing this method (with a different name) and calling it directly:
Looks good, right?
Great, so let’s just make sure that we also support setting these variables and we can go home and enjoy the rest of the -
Damn.
In retrospect this seems a bit obvious. my_getattribute
returns something that is like a reference1. We can mutate it, but we can’t reassign the original value to a new object. So what the hell is going on here? If foo.baz
translates to any function call, how can we ever assign to it?
When we look at a statement like foo.bar = 1
, there’s an extra something going on. And it seems like we simply don’t access attributes the same way when we set them, as opposed to get them. Indeed, we can also override __setattr__
in a similar manner:
A few things to note about the above snippet:
- There’s intentional asymmetry such that
__setattr__
doesn’t have an analogous accompanying method similar to__getattribute__
(i.e., there’s no__setattribute__
). __setattr__
works in__init__
as well - that’s why we do a weird assignment tomy_dunder_dict
(self.__dict__['my_dunder_dict'] = {}
). Otherwise, we’ll get infinite recursion.
And then… there’s property
(and friends). Decorators that make methods behave like members. Sigh.
Let’s try to understand how this is happening.
Out of curiosity, what’s in f.__dict__
then?
Let me get this straight. bar
is not in __dict__
, but __getattr__
isn’t called. wat?.
Well, bar
is a method and it accepts the class instance, but it’s actually a member of the class object, not the instance. Let’s verify:
We can see bar
as the last item in that dictionary. In order to reconstruct __getattribute__
we need to answer another question here - who has precedence, the instance, or the class?
Alright. We now know that the class’ __dict__
is also checked and that it has priority. So it’s just a minor complicati –
Wait wait wait, when did we even call the bar
method? I mean, our pseudo-code for __getattribute__
never calls the object, so what’s going on?
Enter The Descriptor Protocol:
descr.__get__(self, obj, type=None) -> value
descr.__set__(self, obj, value) -> None
descr.__delete__(self, obj) -> None
That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.
If an object defines both
__get__()
and__set__()
, it is considered a data descriptor. Descriptors that only define__get__()
are called non-data descriptors (they are typically used for methods but other uses are possible).Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.
To make a read-only data descriptor, define both
__get__()
and__set__()
with the__set__()
raising an AttributeError when called. Defining the__set__()
method with an exception raising placeholder is enough to make it a data descriptor.
TL;DR - if you implement any of __get__
, __set__
or __delete__
you have officially, erm… Descripted a Protocol, I guess? Which is exactly what the property
decorator is doing. In the case of calling it like we did, it defines a read-only data descriptor, which is then called in __getattribute__
.
One last refactor:
foo.bar
as a getter, is equivalent to callingfoo.__getattribute__('bar')
, which is roughly:
Let’s try to demonstrate all the behaviors we know:
There’s always more. I’ve just scratched the surface of Python’s internals, and while the general idea is correct, it’s probable that the small details are implemented differently. Please read the official sources below if you need exact implementation details.
My hope is that aside from demonstrating how attribute access works, I’ve also convinced you of how beautiful Python is - a language you can push and prod and experiment with. Settle some knowledge debt today.
Sources
Python is too cool for by-value or by-reference parameters. Check out this article by Robert Heaton on the subject ↩︎
Follow me on Twitter and Facebook
Thanks to Yonatan Nakar, Yosef Twaik, Shachar Ohana, Ram Rachum and Hannan Aharonov for reading drafts of this.