python: class vs tuple huge memory overhead (?)

There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__. It is implemented in the library recordclass:

$ pip install recordclass

>>> import sys
>>> from recordclass import dataobject, make_dataclass

Create the class:

class Person(dataobject):
   first:str
   last:str

or

>>> Person = make_dataclass('Person', 'first last')

As result (python 3.9, 64 bit):

>>> print(sys.getsizeof(Person(100,100)))
32

For __slot__ based class we have (python 3.9, 64 bit):

class PersonSlots:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

>>> print(sys.getsizeof(Person(100,100)))
48

As a result more saving of memory is possible.

For dataobject-based:

l = [Person(i, i) for i in range(10000000)]
memory size: 409 Mb

For __slots__-based:

  l = [PersonSlots(i, i) for i in range(10000000)]
  memory size: 569 Mb

Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes.

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__ attribute.


As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

So, let's compare some approaches.

tuple

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

class Person

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

namedtuple (tuple + __slots__)

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

class Person + __slots__

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.