Import vendored dependencies in Python package without modifying sys.path or 3rd party packages

First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the requests library. If you are relying on people using pip install to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python site-packages location.

At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as .zip files without setuptools support, so that's certainly such an environment.

So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated.

This is the path that the pip project has chosen, see their tasks subdirectory for their automation, which builds on the invoke library. See the pip project vendoring README for their policy and rationale (chief among those is that pip needs to bootstrap itself, e.g. have their dependencies available to be able to install anything).

You should not use any of the other options; you already enumerated the issues with #2 and #3.

The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by setuptools doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that pip solves with a manual debundling process). setuptools actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the packaging project in the setuptools vendored subpackage; the setuptools.extern namespace is handled by the custom import hook, which then redirects either to setuptools._vendor or the top-level name if importing from the vendorized package fails.

The pip automation to update vendored packages takes the following steps:

  • Delete everything in the _vendor/ subdirectory except the documentation, the __init__.py file and the requirements text file.
  • Use pip to install all vendored dependencies into that directory, using a dedicated requirements file named vendor.txt, avoiding compilation of .pyc bytecache files and ignoring transient dependencies (these are assumed to be listed in vendor.txt already); the command used is pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps.
  • Delete everything that was installed by pip but not needed in a vendored environment, i.e. *.dist-info, *.egg-info, the bin directory, and a few things from installed dependencies that pip would never use.
  • Collect all installed directories and added files sans .py extension (so anything not in the whitelist); this is the vendored_libs list.
  • Rewrite imports; this is simply a series of regexes, where every name in vendored_lists is used to replace import <name> occurrences with import pip._vendor.<name> and every from <name>(.*) import occurrence with from pip._vendor.<name>(.*) import.
  • Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the pip patch for requests is interesting here in that it updates the requests library backwards compatibility layer for the vendored packages that the requests library had removed; this patch is quite meta!

So in essence, the most important part of the pip approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the pip specific parts, it is simply the following process:

import shutil
import subprocess
import re

from functools import partial
from itertools import chain
from pathlib import Path

WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}

def delete_all(*paths, whitelist=frozenset()):
    for item in paths:
        if item.is_dir():
            shutil.rmtree(item, ignore_errors=True)
        elif item.is_file() and item.name not in whitelist:
            item.unlink()

def iter_subtree(path):
    """Recursively yield all files in a subtree, depth-first"""
    if not path.is_dir():
        if path.is_file():
            yield path
        return
    for item in path.iterdir():
        if item.is_dir():
            yield from iter_subtree(item)
        elif item.is_file():
            yield item

def patch_vendor_imports(file, replacements):
    text = file.read_text('utf8')
    for replacement in replacements:
        text = replacement(text)
    file.write_text(text, 'utf8')

def find_vendored_libs(vendor_dir, whitelist):
    vendored_libs = []
    paths = []
    for item in vendor_dir.iterdir():
        if item.is_dir():
            vendored_libs.append(item.name)
        elif item.is_file() and item.name not in whitelist:
            vendored_libs.append(item.stem)  # without extension
        else:  # not a dir or a file not in the whilelist
            continue
        paths.append(item)
    return vendored_libs, paths

def vendor(vendor_dir):
    # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
    pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'

    # remove everything
    delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)

    # install with pip
    subprocess.run([
        'pip', 'install', '-t', str(vendor_dir),
        '-r', str(vendor_dir / 'vendor.txt'),
        '--no-compile', '--no-deps'
    ])

    # delete stuff that's not needed
    delete_all(
        *vendor_dir.glob('*.dist-info'),
        *vendor_dir.glob('*.egg-info'),
        vendor_dir / 'bin')

    vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)

    replacements = []
    for lib in vendored_libs:
        replacements += (
            partial(  # import bar -> import foo._vendor.bar
                re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,
                r'\1from {} import {}\n'.format(pkgname, lib)
            ),
            partial(  # from bar -> from foo._vendor.bar
                re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,
                r'\1from {}.{}\2'.format(pkgname, lib)
            ),
        )

    for file in chain.from_iterable(map(iter_subtree, paths)):
        patch_vendor_imports(file, replacements)

if __name__ == '__main__':
    # this assumes this is a script in foo next to foo/_vendor
    here = Path('__file__').resolve().parent
    vendor_dir = here / 'foo' / '_vendor'
    assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
    assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
    vendor(vendor_dir)