Python Attribute Access and the Descriptor Protocol

October 16, 2019

Let’s look at the following snippet:

class Foo:
    def __init__(self):
        self.bar = 'hello!'

foo = Foo()
print(foo.bar)

We already know how Foo instantiation works. Today, our question is this:

What exactly happens when we say foo.bar?

You might already know that most Python classes have an internal dictionary called __dict__ which holds all of their internal variables. And what’s amazing about Python is that we can simply inspect even internal implementation details like this one:

>>> foo = Foo()
>>> foo.__dict__
{'bar': 'hello!'}

So we can arrive at the following incomplete hypothesis:

foo.bar is equivalent to foo.__dict__['bar'].

It looks correct:

>>> foo = Foo()
>>> foo.__dict__['bar']
'hello!'

Now, suppose you’re a sophisticated fancy-pants modern Pythonista and you know you can define dynamic attributes in Python classes, how does that sit with our knowledge about __dict__?

>>> class Foo:
...     def __init__(self):
...         self.bar = 'hello!'
...         
...     def __getattr__(self, item):
...         return 'goodbye!'
...         
... foo = Foo()
>>> foo.bar
'hello!'
>>> foo.baz
'goodbye!'
>>> foo.__dict__
{'bar': 'hello!'}

Err… okay. We can see that __getattr__ can “fake” attribute access, but it doesn’t work if we already have that variable defined (meaning, foo.bar returns 'hello!' and not 'goodbye!'). So this mechanism is more complex than it seemed, and there’s actual logic involved when accessing attributes. Indeed, there’s a magic method that’s called whenever we access instance attributes, but it’s clearly not __getattr__ as we can see in the above example. This magic method is called __getattribute__ and we’ll try to reverse engineer it by observing its different behaviors. For now, let’s modify our hypothesis:

foo.bar is equivalent to calling foo.__getattribute__('bar'), which is roughly:
def __getattribute__(self, item):
  if item in self.__dict__:
    return self.__dict__[item]
  return self.__getattr__(item)

Let’s test this out by actually implementing this method (with a different name) and calling it directly:

>>> class Foo:
...     def __init__(self):
...         self.bar = 'hello!'
...         
...     def __getattr__(self, item):
...         return 'goodbye!'
...     
...     def my_getattribute(self, item):
...         if item in self.__dict__:
...             return self.__dict__[item]
...         return self.__getattr__(item)
>>> foo = Foo()
>>> foo.bar
'hello!'
>>> foo.baz
'goodbye!'
>>> foo.my_getattribute('bar')
'hello!'
>>> foo.my_getattribute('baz')
'goodbye!'

Looks good, right?

Great, so let’s just make sure that we also support setting these variables and we can go home and enjoy the rest of the -

>>> foo.baz = 1337
>>> foo.baz
1337
>>> foo.my_getattribute('baz') = 'h4x0r'
SyntaxError: can't assign to function call

Damn.

In retrospect this seems a bit obvious. my_getattribute returns something that is like a reference¹. We can mutate it, but we can’t reassign the original value to a new object. So what the hell is going on here? If foo.baz translates to any function call, how can we ever assign to it?

When we look at a statement like foo.bar = 1, there’s an extra something going on. And it seems like we simply don’t access attributes the same way when we set them, as opposed to get them. Indeed, we can also override __setattr__ in a similar manner:

>>> class Foo:
...     def __init__(self):
...         self.__dict__['my_dunder_dict'] = {}
...         self.bar = 'hello!'
...         
...     def __setattr__(self, item, value):
...         self.my_dunder_dict[item] = value
...     
...     def __getattr__(self, item):
...         return self.my_dunder_dict[item]
>>> foo = Foo()
>>> foo.bar
'hello!'
>>> foo.bar = 'goodbye!'
>>> foo.bar
'goodbye!'
>>> foo.baz
Traceback (most recent call last):
  File "<pyshell#75>", line 1, in <module>
    foo.baz
  File "<pyshell#70>", line 10, in __getattr__
    return self.my_dunder_dict[item]
KeyError: 'baz'
>>> foo.baz = 1337
>>> foo.baz
1337
>>> foo.__dict__
{'my_dunder_dict': {'bar': 'goodbye!', 'baz': 1337}}

A few things to note about the above snippet:

There’s intentional asymmetry such that __setattr__ doesn’t have an analogous accompanying method similar to __getattribute__ (i.e., there’s no __setattribute__).
__setattr__ works in __init__ as well - that’s why we do a weird assignment to my_dunder_dict (self.__dict__['my_dunder_dict'] = {}). Otherwise, we’ll get infinite recursion.

And then… there’s property (and friends). Decorators that make methods behave like members. Sigh.

Let’s try to understand how this is happening.

>>> class Foo(object):
...     def __getattribute__(self, item):
...         print('__getattribute__ was called')
...         return super().__getattribute__(item)
...     
...     def __getattr__(self, item):
...         print('__getattr__ was called')
...         return super().__getattr__(item)
...     
...     @property
...     def bar(self):
...          print('bar property was called')
...          return 100
>>> f = Foo()
>>> f.bar
__getattribute__ was called
bar property was called

Out of curiosity, what’s in f.__dict__ then?

>>> f.__dict__
__getattribute__ was called
{}

Let me get this straight. bar is not in __dict__, but __getattr__ isn’t called. wat?.

Well, bar is a method and it accepts the class instance, but it’s actually a member of the class object, not the instance. Let’s verify:

>>> Foo.__dict__
mappingproxy({'__dict__': <attribute '__dict__' of 'Foo' objects>,
              '__doc__': None,
              '__getattr__': <function Foo.__getattr__ at 0x038308A0>,
              '__getattribute__': <function Foo.__getattribute__ at 0x038308E8>,
              '__module__': '__main__',
              '__weakref__': <attribute '__weakref__' of 'Foo' objects>,
              'bar': <property object at 0x0381EC30>})

We can see bar as the last item in that dictionary. In order to reconstruct __getattribute__ we need to answer another question here - who has precedence, the instance, or the class?

>>> f.__dict__['bar'] = 'will we see this printed?'
__getattribute__ was called
>>> f.bar
__getattribute__ was called
bar property was called
100

Alright. We now know that the class’ __dict__ is also checked and that it has priority. So it’s just a minor complicati –

Wait wait wait, when did we even call the bar method? I mean, our pseudo-code for __getattribute__ never calls the object, so what’s going on?

Enter The Descriptor Protocol:

descr.__get__(self, obj, type=None) -> value
descr.__set__(self, obj, value) -> None
descr.__delete__(self, obj) -> None
That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.
If an object defines both __get__() and __set__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are typically used for methods but other uses are possible).
Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.
To make a read-only data descriptor, define both __get__() and __set__() with the __set__() raising an AttributeError when called. Defining the __set__() method with an exception raising placeholder is enough to make it a data descriptor.

TL;DR - if you implement any of __get__, __set__ or __delete__ you have officially, erm… Descripted a Protocol, I guess? Which is exactly what the property decorator is doing. In the case of calling it like we did, it defines a read-only data descriptor, which is then called in __getattribute__.

One last refactor:

foo.bar as a getter, is equivalent to calling foo.__getattribute__('bar'), which is roughly:

def __getattribute__(self, item):
  if item in self.__class__.__dict__:
    v = self.__class__.__dict__[item]
  elif item in self.__dict__:
    v = self.__dict__[item]
  else:
    v = self.__getattr__(item)
  if hasattr(v, '__get__'):
    v = v.__get__(self, type(self))
  return v

Let’s try to demonstrate all the behaviors we know:

class Foo:
    class_attr = "I'm a class attribute!"
    
    def __init__(self):
        self.dict_attr = "I'm in a dict!"
        
    @property
    def property_attr(self):
        return "I'm a read-only property!"
    
    def __getattr__(self, item):
        return "I'm dynamically returned!"
    
    def my_getattribute(self, item):
      if item in self.__class__.__dict__:
        print('Retrieving from self.__class__.__dict__')
        v = self.__class__.__dict__[item]
      elif item in self.__dict__:
        print('Retrieving from self.__dict__')
        v = self.__dict__[item]
      else:
        print('Retrieving from self.__getattr__')
        v = self.__getattr__(item)
      if hasattr(v, '__get__'):
        print("Invoking descriptor's __get__")
        v = v.__get__(self, type(self))
      return v

>>> foo = Foo()
... 
... print(foo.class_attr)
... print(foo.dict_attr)
... print(foo.property_attr)
... print(foo.dynamic_attr)
... 
... print()
... 
... print(foo.my_getattribute('class_attr'))
... print(foo.my_getattribute('dict_attr'))
... print(foo.my_getattribute('property_attr'))
... print(foo.my_getattribute('dynamic_attr'))
I'm a class attribute!
I'm in a dict!
I'm a read-only property!
I'm dynamically returned!

Retrieving from self.__class__.__dict__
I'm a class attribute!
Retrieving from self.__dict__
I'm in a dict!
Retrieving from self.__class__.__dict__
Invoking descriptor's __get__
I'm a read-only property!
Retrieving from self.__getattr__
I'm dynamically returned!

There’s always more. I’ve just scratched the surface of Python’s internals, and while the general idea is correct, it’s probable that the small details are implemented differently. Please read the official sources below if you need exact implementation details.

My hope is that aside from demonstrating how attribute access works, I’ve also convinced you of how beautiful Python is - a language you can push and prod and experiment with. Settle some knowledge debt today.

Sources

Python is too cool for by-value or by-reference parameters. Check out this article by Robert Heaton on the subject ↩︎

Discuss this post at the comment section below.
Follow me on Twitter and Facebook
Thanks to Yonatan Nakar, Yosef Twaik, Shachar Ohana, Ram Rachum and Hannan Aharonov for reading drafts of this.

Python Attribute Access and the Descriptor Protocol

Sources

Similar Posts