Dynamically setting __tablename__ for sharding in SQLAlchemy?

OK, we went with the custom SQLAlchemy declaration rather than the declarative one.

So we create a dynamic table object like this:

from sqlalchemy import MetaData, Table, Column

def get_table_object(self, md5hash):
    metadata = MetaData()
    table_name = 'table_' + md5hash
    table_object = Table(table_name, metadata,
        Column('Column1', DATE, nullable=False),
        Column('Column2', DATE, nullable=False)
    )
    clear_mappers()
    mapper(ActualTableObject, table_object)
    return ActualTableObject

Where ActualTableObject is the class mapping to the table.


In Augmenting the Base you find a way of using a custom Base class that can, for example, calculate the __tablename__ attribure dynamically:

class Base(object):
    @declared_attr
    def __tablename__(cls):
        return cls.__name__.lower()

The only problem here is that I don't know where your hash comes from, but this should give a good starting point.

If you require this algorithm not for all your tables but only for one you could just use the declared_attr on the table you are interested in sharding.


Because I insist to use declarative classes with their __tablename__ dynamically specified by given parameter, after days of failing with other solutions and hours of studying SQLAlchemy internals, I come up with the following solution that I believe is simple, elegant and race-condition free.

def get_model(suffix):
    DynamicBase = declarative_base(class_registry=dict())

    class MyModel(DynamicBase):
        __tablename__ = 'table_{suffix}'.format(suffix=suffix)

        id = Column(Integer, primary_key=True)
        name = Column(String)
        ...

    return MyModel

Since they have their own class_registry, you will not get that warning saying:

This declarative base already contains a class with the same class name and module name as mypackage.models.MyModel, and will be replaced in the string-lookup table.

Hence, you will not be able to reference them from other models with string lookup. However, it works perfectly fine to use these on-the-fly declared models for foreign keys as well:

ParentModel1 = get_model(123)
ParentModel2 = get_model(456)

class MyChildModel(BaseModel):
    __tablename__ = 'table_child'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_1_id = Column(Integer, ForeignKey(ParentModel1.id))
    parent_2_id = Column(Integer, ForeignKey(ParentModel2.id))
    parent_1 = relationship(ParentModel1)
    parent_2 = relationship(ParentModel2)

If you only use them to query/insert/update/delete without any reference left such as foreign key reference from another table, they, their base classes and also their class_registry will be garbage collected, so no trace will be left.