Collapse multiple submodules to one Cython extension

This answer provides a prototype for Python3 (which can be easily adapted for Python2) and shows how several cython-modules can be bundled into single extension/shared-library/pyd-file.

I keep it around for historical/didactical reasons - a more concise recipe is given in this answer, which present a good alternative to @Mylin's proposal of putting everything into the same pyx-file.


The question of multiple modules in the same shared object is also discussed in PEP489, where two solutions are proposed:

  • one similar to this and to the already above referred answer with extending Finders with proper functionality
  • second solution is to introduce symlink with "right" names, which would show to the common module (but here the advantages of having one common module are somehow negated).

Preliminary note: Since Cython 0.29, Cython uses multi-phase initialization for Python>=3.5. One needs to switch multi-phase initialization off (otherwise PyInit_xxx isn't sufficient, see this SO-post), which can be done by passing -DCYTHON_PEP489_MULTI_PHASE_INIT=0 to gcc/other compiler.


When bundling multiple Cython-extension (let's call them bar_a and bar_b) into one single shared object (let's call it foo), the main problem is the import bar_a operation, because of the way the loading of modules works in Python (obviously simplified, this SO-post has more info):

  1. Look for bar_a.so (or similar), use ldopen for loading the shared library and call PyInit_bar_a which would initialize/register the module, if not successful
  2. Look for bar_a.py and load it, if not successful...
  3. Look for bar_a.pyc and load it, if not successful - error.

The steps 2. and 3. will obviously fail. Now, the issue is that there is no bar_a.so to be found and albeit the initialization function PyInit_bar_a can be found in foo.so, Python doesn't know where to look and gives up on searching.

Luckily, there are hooks available, so we can teach Python to look in the right places.

When importing a module, Python utilizes finders from sys.meta_path, which return the right loader for a module (for simplicity I'm using the legacy workflow with loaders and not module-spec). The default finders return None, i.e. no loader and it results in the import error.

That means we need to add a custom finder to sys.meta_path, which would recognize our bundled modules and return loaders, which in their turn would call the right PyInit_xxx-function.

The missing part: How should the custom finder finds its way into the sys.meta_path? It would be pretty inconvenient if the user would have to do it manually.

When a submodule of a package is imported, first the package's __init__.py-module is loaded and this is the place where we can inject our custom finder.

After calling python setup.py build_ext install for the setup presented further below, there is a single shared library installed and the submodules can be loaded as usual:

>>> import foo.bar_a as a
>>> a.print_me()
I'm bar_a
>>> from foo.bar_b import print_me as b_print
>>> b_print()
I'm bar_b

###Putting it all together:

Folder structure:

../
 |-- setup.py
 |-- foo/
      |-- __init__.py
      |-- bar_a.pyx
      |-- bar_b.pyx
      |-- bootstrap.pyx

init.py:

# bootstrap is the only module which 
# can be loaded with default Python-machinery
# because the resulting extension is called `bootstrap`:
from . import bootstrap

# injecting our finders into sys.meta_path
# after that all other submodules can be loaded
bootstrap.bootstrap_cython_submodules()

bootstrap.pyx:

import sys
import importlib

# custom loader is just a wrapper around the right init-function
class CythonPackageLoader(importlib.abc.Loader):
    def __init__(self, init_function):
        super(CythonPackageLoader, self).__init__()
        self.init_module = init_function
        
    def load_module(self, fullname):
        if fullname not in sys.modules:
            sys.modules[fullname] = self.init_module()
        return sys.modules[fullname]
 
# custom finder just maps the module name to init-function      
class CythonPackageMetaPathFinder(importlib.abc.MetaPathFinder):
    def __init__(self, init_dict):
        super(CythonPackageMetaPathFinder, self).__init__()
        self.init_dict=init_dict
        
    def find_module(self, fullname, path):
        try:
            return CythonPackageLoader(self.init_dict[fullname])
        except KeyError:
            return None

# making init-function from other modules accessible:
cdef extern from *:
    """
    PyObject *PyInit_bar_a(void);
    PyObject *PyInit_bar_b(void);
    """
    object PyInit_bar_a()
    object PyInit_bar_b()
    
# wrapping C-functions as Python-callables:
def init_module_bar_a():
    return PyInit_bar_a()
    
def init_module_bar_b():
    return PyInit_bar_b()


# injecting custom finder/loaders into sys.meta_path:
def bootstrap_cython_submodules():
    init_dict={"foo.bar_a" : init_module_bar_a,
               "foo.bar_b" : init_module_bar_b}
    sys.meta_path.append(CythonPackageMetaPathFinder(init_dict))  

bar_a.pyx:

def print_me():
    print("I'm bar_a")

bar_b.pyx:

def print_me():
    print("I'm bar_b")

setup.py:

from setuptools import setup, find_packages, Extension
from Cython.Build import cythonize

sourcefiles = ['foo/bootstrap.pyx', 'foo/bar_a.pyx', 'foo/bar_b.pyx']

extensions = cythonize(Extension(
            name="foo.bootstrap",
            sources = sourcefiles,
    ))


kwargs = {
      'name':'foo',
      'packages':find_packages(),
      'ext_modules':  extensions,
}


setup(**kwargs)

NB: This answer was the starting point for my experiments, however it uses PyImport_AppendInittab and I cannot see a way how can this be plugged in into the normal python.


I have written a tool to build a binary Cython extension from a Python package, based on the answers from @DavidW @ead above. The package can contain subpackages, which will also be included in the binary. Here is the idea.

There are two problems to solve here:

  1. Collapse the whole package (including all subpackages) to a single Cython extension
  2. Allow imports as usual

The above answers work well on a single layer layout, but when we try to go further with subpackages, there will be name conflicts when any two modules in different subpackages have the same name. For instance,

foo/
  |- bar/
  |  |- __init__.py
  |  |- base.py
  |- baz/
  |  |- __init__.py
  |  |- base.py

would introduces two PyInit_base functions in the generated C code, resulting in duplicated function definitions.

This tool solves this by flattening all the modules to the root package layer (such as foo/bar/base.py -> foo/bar_base.py) before the build.

This leads to the second problem, where we cannot use the original way to import anything from subpackages (e.g. from foo.bar import base). This problem is tackled by introducing a finder (modified from @DavidW's answer) that performs the redirection.

class _ExtensionLoader(_imp_mac.ExtensionFileLoader):
  def __init__(self, name, path, is_package=False, sep="_"):
    super(_ExtensionLoader, self).__init__(name, path)
    self._sep = sep
    self._is_package = is_package

  def create_module(self, spec):
    s = _copy.copy(spec)
    s.name = _rename(s.name, sep=self._sep)
    return super(_ExtensionLoader, self).create_module(s)

  def is_package(self, fullname):
    return self._is_package

# Chooses the right init function
class _CythonPackageMetaPathFinder(_imp_abc.MetaPathFinder):
  def __init__(self, name, packages=None, sep="_"):
    super(_CythonPackageMetaPathFinder, self).__init__()
    self._prefix = name + "."
    self._sep = sep
    self._start = len(self._prefix)
    self._packages = set(packages or set())

  def __eq__(self, other):
    return (self.__class__.__name__ == other.__class__.__name__ and
            self._prefix == getattr(other, "_prefix", None) and
            self._sep == getattr(other, "_sep", None) and
            self._packages == getattr(other, "_packages", None))

  def __hash__(self):
    return (hash(self.__class__.__name__) ^
            hash(self._prefix) ^
            hash(self._sep) ^
            hash("".join(sorted(self._packages))))

  def find_spec(self, fullname, path, target=None):
    if fullname.startswith(self._prefix):
      name = _rename(fullname, sep=self._sep)
      is_package = fullname in self._packages
      loader = _ExtensionLoader(name, __file__, is_package=is_package)
      return _imp_util.spec_from_loader(
          name, loader, origin=__file__, is_package=is_package)

It changes the original import (dotted) path to its corresponding location of the moved module. The set of subpackages has to be provided for the loader to load it as a package rather than a non-package module.


This answer is follows the basic pattern of @ead's answer, but uses a slightly simpler approach, which eliminates most of boilerplate code.

The only difference is the simpler version of bootstrap.pyx:

import sys
import importlib
import importlib.abc

# Chooses the right init function     
class CythonPackageMetaPathFinder(importlib.abc.MetaPathFinder):
    def __init__(self, name_filter):
        super(CythonPackageMetaPathFinder, self).__init__()
        self.name_filter = name_filter

    def find_spec(self, fullname, path, target=None):
        if fullname.startswith(self.name_filter):
            # use this extension-file but PyInit-function of another module:
            loader = importlib.machinery.ExtensionFileLoader(fullname, __file__)
            return importlib.util.spec_from_loader(fullname, loader)
    
# injecting custom finder/loaders into sys.meta_path:
def bootstrap_cython_submodules():
    sys.meta_path.append(CythonPackageMetaPathFinder('foo.')) 

Essentially, I look to see if the name of the module being imported starts with foo., and if it does I reuse the standard importlib approach to loading an extension module, passing the current .so filename as the path to look in - the right name of the init function (there are multiple ) will be deduced from the package name.

Obviously, this is just a prototype - one might want to do some improvements. For example, right now import foo.bar_c would lead to a somewhat unusual error message: "ImportError: dynamic module does not define module export function (PyInit_bar_c)", one could return None for all submodule names that are not on a white list.


First off, I should note that it's impossible to compile a single .so file with sub packages using Cython. So if you want sub packages, you're going to have to generate multiple .so files, as each .so can only represent a single module.

Second, it doesn't appear that you can compile multiple Cython/Python files (I'm using the Cython language specifically) and link them into a single module at all.

I've tried to compile multiply Cython files into a single .so every which way, both with distutils and with manual compilation, and it always fails to import at runtime.

It seems that it's fine to link a compiled Cython file with other libraries, or even other C files, but something goes wrong when linking together two compiled Cython files, and the result isn't a proper Python extension.

The only solution I can see is to compile everything as a single Cython file. In my case, I've edited my setup.py to generate a single .pyx file which in turn includes every .pyx file in my source directory:

includesContents = ""
for f in os.listdir("src-dir"):
    if f.endswith(".pyx"):
        includesContents += "include \"" + f + "\"\n"

includesFile = open("src/extension-name.pyx", "w")
includesFile.write(includesContents)
includesFile.close()

Then I just compile extension-name.pyx. Of course this breaks incremental and parallel compilation, and you could end up with extra naming conflicts since everything gets pasted into the same file. On the bright side, you don't have to write any .pyd files.

I certainly wouldn't call this a preferable build method, but if everything absolutely has to be in one extension module, this is the only way I can see to do it.