What is the relationship between the Python data model and built-in functions?

What is the relationship between the Python datamodel and builtin functions?

  • The builtins and operators use the underlying datamodel methods or attributes.
  • The builtins and operators have more elegant behavior and are in general more forward compatible.
  • The special methods of the datamodel are semantically non-public interfaces.
  • The builtins and language operators are specifically intended to be the user interface for behavior implemented by special methods.

Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel.

The semantically internal APIs are more likely to change than the public interfaces. While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. Doing so has the following risks:

  • You may find you have more breaking changes when upgrading your Python executable or switching to other implementations of Python (like PyPy, IronPython, or Jython, or some other unforeseen implementation.)
  • Your colleagues will likely think poorly of your language skills and conscientiousness, and consider it a code-smell, bringing you and the rest of your code to greater scrutiny.
  • The builtin functions are easy to intercept behavior for. Using special methods directly limits the power of your Python for introspection and debugging.

In depth

The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.

The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:

  • next(obj, default) allows you to provide a default instead of raising StopIteration when an iterator runs out, while obj.__next__() does not.
  • str(obj) fallsback to obj.__repr__() when obj.__str__() isn't available - whereas calling obj.__str__() directly would raise an attribute error.
  • obj != other fallsback to not obj == other in Python 3 when no __ne__ - calling obj.__ne__(other) would not take advantage of this.

(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins module, to further customize behavior.)

Mapping the builtins and operators to the datamodel

Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:

builtins/     special methods/
operators  -> datamodel               NOTES (fb == fallback)

repr(obj)     obj.__repr__()          provides fb behavior for str
str(obj)      obj.__str__()           fb to __repr__ if no __str__
bytes(obj)    obj.__bytes__()         Python 3 only
unicode(obj)  obj.__unicode__()       Python 2 only
format(obj)   obj.__format__()        format spec optional.
hash(obj)     obj.__hash__()
bool(obj)     obj.__bool__()          Python 3, fb to __len__
bool(obj)     obj.__nonzero__()       Python 2, fb to __len__
dir(obj)      obj.__dir__()
vars(obj)     obj.__dict__            does not include __slots__
type(obj)     obj.__class__           type actually bypasses __class__ -
                                      overriding __class__ will not affect type
help(obj)     obj.__doc__             help uses more than just __doc__
len(obj)      obj.__len__()           provides fb behavior for bool
iter(obj)     obj.__iter__()          fb to __getitem__ w/ indexes from 0 on
next(obj)     obj.__next__()          Python 3
next(obj)     obj.next()              Python 2
reversed(obj) obj.__reversed__()      fb to __len__ and __getitem__
other in obj  obj.__contains__(other) fb to __iter__ then __getitem__
obj == other  obj.__eq__(other)
obj != other  obj.__ne__(other)       fb to not obj.__eq__(other) in Python 3
obj < other   obj.__lt__(other)       get >, >=, <= with @functools.total_ordering
complex(obj)  obj.__complex__()
int(obj)      obj.__int__()
float(obj)    obj.__float__()
round(obj)    obj.__round__()
abs(obj)      obj.__abs__()

The operator module has length_hint which has a fallback implemented by a respective special method if __len__ is not implemented:

length_hint(obj)  obj.__length_hint__() 

Dotted Lookups

Dotted lookups are contextual. Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance __dict__ (for instance variables), then in the class hierarchy for non-data descriptors (like methods). Special methods implement the following behaviors:

obj.attr      obj.__getattr__('attr')       provides fb if dotted lookup fails
obj.attr      obj.__getattribute__('attr')  preempts dotted lookup
obj.attr = _  obj.__setattr__('attr', _)    preempts dotted lookup
del obj.attr  obj.__delattr__('attr')       preempts dotted lookup

Descriptors

Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). A data descriptor implements either __set__ or __delete__:

obj.attr        descriptor.__get__(obj, type(obj)) 
obj.attr = val  descriptor.__set__(obj, val)
del obj.attr    descriptor.__delete__(obj)

When the class is instantiated (defined) the following descriptor method __set_name__ is called if any descriptor has it to inform the descriptor of its attribute name. (This is new in Python 3.6.) cls is same as type(obj) above, and 'attr' stands in for the attribute name:

class cls:
    @descriptor_type
    def attr(self): pass # -> descriptor.__set_name__(cls, 'attr') 

Items (subscript notation)

The subscript notation is also contextual:

obj[name]         -> obj.__getitem__(name)
obj[name] = item  -> obj.__setitem__(name, item)
del obj[name]     -> obj.__delitem__(name)

A special case for subclasses of dict, __missing__ is called if __getitem__ doesn't find the key:

obj[name]         -> obj.__missing__(name)  

Operators

There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, | operators, for example:

obj + other   ->  obj.__add__(other), fallback to other.__radd__(obj)
obj | other   ->  obj.__or__(other), fallback to other.__ror__(obj)

and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=, for example:

obj += other  ->  obj.__iadd__(other)
obj |= other  ->  obj.__ior__(other)

(If these in-place operators are not defined, Python falls back to, for example, for obj += other to obj = obj + other)

and unary operations:

+obj          ->  obj.__pos__()
-obj          ->  obj.__neg__()
~obj          ->  obj.__invert__()

Context Managers

A context manager defines __enter__, which is called on entering the code block (its return value, usually self, is aliased with as), and __exit__, which is guaranteed to be called on leaving the code block, with exception information.

with obj as enters_return_value: #->  enters_return_value = obj.__enter__()
    raise Exception('message')
                                 #->  obj.__exit__(Exception, 
                                 #->               Exception('message'), 
                                 #->               traceback_object)

If __exit__ gets an exception and then returns a false value, it will reraise it on leaving the method.

If no exception, __exit__ gets None for those three arguments instead, and the return value is meaningless:

with obj:           #->  obj.__enter__()
    pass
                    #->  obj.__exit__(None, None, None)

Some Metaclass Special Methods

Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:

isinstance(obj, cls) -> cls.__instancecheck__(obj)
issubclass(sub, cls) -> cls.__subclasscheck__(sub)

An important takeaway is that while the builtins like next and bool do not change between Python 2 and 3, underlying implementation names are changing.

Thus using the builtins also offers more forward compatibility.

When am I supposed to use the special names?

In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."

This is not just cultural, but it is also in Python's treatment of API's. When a package's __init__.py uses import * to provide an API from a subpackage, if the subpackage does not provide an __all__, it excludes names that start with underscores. The subpackage's __name__ would also be excluded.

IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing __init__, __new__, __repr__, __str__, __eq__, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.

Thus I assert:

The special "dunder" methods are not a part of the public interface. Avoid using them directly.

So when to use them?

The main use-case is when implementing your own custom object or subclass of a builtin object.

Try to only use them when absolutely necessary. Here are some examples:

Use the __name__ special attribute on functions or classes

When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the @wraps(fn) decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__ attribute directly:

from functools import wraps

def decorate(fn): 
    @wraps(fn)
    def decorated(*args, **kwargs):
        print('calling fn,', fn.__name__) # exception to the rule
        return fn(*args, **kwargs)
    return decorated

Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__):

def get_class_name(self):
    return type(self).__name__
          # ^          # ^- must use __name__, no builtin e.g. name()
          # use type, not .__class__

Using special attributes to write custom classes or subclassed builtins

When we want to define custom behavior, we must use the data-model names.

This makes sense, since we are the implementors, these attributes aren't private to us.

class Foo(object):
    # required to here to implement == for instances:
    def __eq__(self, other):      
        # but we still use == for the values:
        return self.value == other.value
    # required to here to implement != for instances:
    def __ne__(self, other): # docs recommend for Python 2.
        # use the higher level of abstraction here:
        return not self == other  

However, even in this case, we don't use self.value.__eq__(other.value) or not self.__eq__(other) (see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.

Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:

class NoisyFoo(Foo):
    def __eq__(self, other):
        print('checking for equality')
        # required here to call the parent's method
        return super(NoisyFoo, self).__eq__(other) 

Conclusion

The special methods allow users to implement the interface for object internals.

Use the builtin functions and operators wherever you can. Only use the special methods where there is no documented public API.


I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer.


I agree with your own answer that for example len(a) should be used, not a.__len__(). I'd put it like this: len exists so we can use it, and __len__ exists so len can use it. Or however that really works internally, since len(a) can actually be much faster, at least for example for lists and strings:

>>> timeit('len(a)', 'a = [1,2,3]', number=10**8)
4.22549770486512
>>> timeit('a.__len__()', 'a = [1,2,3]', number=10**8)
7.957335462257106

>>> timeit('len(s)', 's = "abc"', number=10**8)
4.1480574509332655
>>> timeit('s.__len__()', 's = "abc"', number=10**8)
8.01780160432645

But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows:

Let's say I need to give a filter function to some function and I want to use a set s as the filter. I'm not going to create an extra function lambda x: x in s or def f(x): return x in s. No. I already have a perfectly fine function that I can use: the set's __contains__ method. It's simpler and more direct. And even faster, as shown here (ignore that I save it as f here, that's just for this timing demo):

>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = s.__contains__', number=10**8)
6.473739433621368
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = lambda x: x in s', number=10**8)
19.940786514456924
>>> timeit('f(2); f(4)', 's = {1, 2, 3}\ndef f(x): return x in s', number=10**8)
20.445680107760325

So while I don't directly call magic methods like s.__contains__(x), I do occasionally pass them somewhere like some_function_needing_a_filter(s.__contains__). And I think that's perfectly fine, and better than the lambda/def alternative.


My thoughts on the examples you showed:

  • Example 1: Asked how to get the size of a list, he answered items.__len__(). Even without any reasoning. My verdict: That's just wrong. Should be len(items).
  • Example 2: Does mention d[key] = value first! And then adds d.__setitem__(key, value) with the reasoning "if your keyboard is missing the square bracket keys", which rarely applies and which I doubt was serious. I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. Which turns it back to a suggestion to use square brackets.
  • Example 3: Suggests obj.__dict__. Bad, like the __len__ example. But I suspect he just didn't know vars(obj), and I can understand it, as vars is less common/known and the name does differ from the "dict" in __dict__.
  • Example 4: Suggests __class__. Should be type(obj). I suspect it's similar to the __dict__ story, although I think type is more well-known.

About privacy: In your own answer you say these methods are "semantically private". I strongly disagree. Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores.

  • The two things you use as arguments are importing behaviour and IDE's autocompletion. But importing and these special methods are different areas, and the one IDE I tried (the popular PyCharm) disagrees with you. I created a class/object with methods _foo and __bar__ and then autocompletion didn't offer _foo but did offer __bar__. And when I used both methods anyway, PyCharm only warned me about _foo (calling it a "protected member"), not about __bar__.
  • PEP 8 says 'weak "internal use" indicator' explicitly for single leading underscore, and explicitly for double leading underscores it mentions the name mangling and later explains that it's for "attributes that you do not want subclasses to use". But the comment about double leading+trailing underscores doesn't say anything like that.
  • The data model page you yourself link to says that these special method names are "Python’s approach to operator overloading". Nothing about privacy there. The words private/privacy/protected don't even appear anywhere on that page.

    I also recommend reading this article by Andrew Montalenti about these methods, emphasizing that "The dunder convention is a namespace reserved for the core Python team" and "Never, ever, invent your own dunders" because "The core Python team reserved a somewhat ugly namespace for themselves". Which all matches PEP 8's instruction "Never invent [dunder/magic] names; only use them as documented". I think Andrew is spot on - it's just an ugly namespace of the core team. And it's for the purpose of operator overloading, not about privacy (not Andrew's point but mine and the data model page's).

Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. That's just not what this is about.

Again, we should use len(a), not a.__len__(). But not because of privacy.