Python dependency hell: A compromise between virtualenv and global dependencies?

I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders

Yes, virtualenv supports this. Install the globally-needed packages globally, and then, whenever you create a virtualenv, supply the --system-site-packages option so that the resulting virtualenv will still be able to use globally-installed packages. When using tox, you can set this option in the created virtualenvs by including sitepackages=true in the appropriate [testenv] section(s).


Problem

You have listed a number of issues that no one approach may be able to completely resolve:

  • space

'I need the "big" packages: numpy, pandas, scipy, matplotlib... Virtually I have about 100+ GB of my HDD filled with python virtual dependencies'

  • time

... installing all of these in each virtual environment takes time

  • publishing

... none of these package managers really help with publishing & testing code ...

  • workflow

I am tempted to move my current workflow from pipenv to conda.

Thankfully, what you have described is not quite the classic dependency problem that plagues package managers - circular dependencies, pinning dependencies, versioning, etc.


Details

I have used conda on Windows many years now under similar restrictions with reasonable success. Conda was originally designed to make installing scipy-related packages easier. It still does.

If you are using the "scipy stack" (scipy, numpy, pandas, ...), conda is your most reliable choice.

Conda can:

  • install scipy packages
  • install C-extensions and non-Python packages (needed to run numpy and other packages)
  • integrate conda packages, conda channels (you should look into this) and pip to access packages
  • dependency separation with virtual environments

Conda can't:

  • help with publishing code

Reproducible Envs

The following steps should help reproduce virtualenvs if needed:

  • Do not install scipy packages with pip. I would rely on conda to do the heavy lifting. It is much faster and more stable. You can pip install less common packages inside conda environments.
  • On occasion, a pip package may conflict with conda packages within an environment (see release notes addressing this issue).

Avoid pip-issues

I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders

Non-conda tools

  • pipx is a pip-like tool that creates global virtual environments.
  • virtualenv traditionally makes virtual environments per project, but thankfully @jwodder's answer explains how to use global packages.
  • virtualenv-wrapper facilitates global virtualenvs.

conda

However, if you want to stay with conda, you can try the following:

A. Make a working environment separate from your base environment, e.g. workenv. Consider this your goto, "global" env to do a bulk of your daily work.

> conda create -n workenv python=3.7 numpy pandas matplotblib scipy
> activate workenv
(workenv)>

B. Test installations of uncommon pip packages (or weighty conda packages) within a clone of the working env

> conda create --name testenv --clone workenv
> activate testenv
(testenv)> pip install pint

Alternatively, make new environments with minimal packages using a requirements.txt file

C. Make a backup of dependencies into a requirements.txt-like file called environment.yml per virtualenv. Optionally make a script to run this command per environment. See docs on sharing/creating environment files. Create environments in the future from this file:

> conda create --name testenv --file environment.yml
> activate testenv
(testenv)> conda list

Publishing

The packaging problem is an ongoing, separate issue that has gained traction with the advent of pyproject.toml file via PEP 518 (see related blog post by author B. Cannon). Packaging tools such as flit or poetry have adopted this modern convention to make distributions and publish them to a server or packaging index (PyPI). The pyproject.toml concept tries to move away from traditional setup.py files with specific dependence to setuptools.

Dependencies

Tools like pipenv and poetry have a unique modern approach to addressing the dependency problem via a "lock" file. This file allows you to track and reproduce the state of your dependency graphs, something novel in the Python packaging world so far (see more on Pipfile vs. setup.py here). Moreover, there are claims that you can still use these tools in conjunction with conda, although I have not tested the extent of these claims. The lock file isn't standardized yet, but according to core developer B. Canon in an interview on The future of Python packaging, (~33m) "I'd like to get us there." (See Updates).

Summary

If you are working with any package from the scipy stack, use conda (Recommended):

  • To conserve space, time and workflow issues use conda or miniconda.
  • To resolve deploying applications or using a "lock" file on your dependencies, consider the following in conjunction with conda:
    • pipenv: use to deploy and make Pipfile.lock
    • poetry: use to deploy and make poetry.lock
  • To publish a library on PyPI, consider:
    • pipenv: develop via pipenv install -e. and manually publish with twine
    • flit: automatically package and *publish
    • poetry: automatically package and publish

See Also

  • conda docs on managing environment files.
  • Podcast interview with B. Cannon discussing the general packaging problem, pyproject.toml, lock files and tools.
  • Podcast interview with K. Reitz discussing packaging tools (pipenv vs. pip, 37m) and dev environment.

Updates:

  • A new dependency resolver is shipped with pip 21.0.
  • PEP 665 proposes a standardized lock-file (c. 2021)

An update on my progress:

Conda package manager turned out to work better for me than pipenv for the following reasons:

  • by default, global dependencies are available from within conda virtual envs
  • it is faster than pipenv when installing/updating dependencies
  • combining pip and conda is really not that problematic, for anything where a conda package is available, install with conda, if not, simply install with pip
  • by using environment.yml, it is possible to have a environment and dependencies re-created on both linux and windows in seconds - environment.yml allows specifying pip and conda dependencies separately (e.g. this solves the above problems with Fiona, Shapely, GDal etc. in Windows, by using conda versions)
  • conda solves most of the difficulties of maintaining packages/dependencies across platforms (e.g. linux, mac, win)
  • it was no problem to have conda (e.g. miniconda) installed side-by-side to an independent python install and use conda through conda run
  • if environments.yml is missing, it is possible to create an env from requirements.txt (conda create -n new environment --file requirements.txt)

Unfortunately, the process of creating the environment.yml seems not really described consistently anywhere. After a while, I realized that the automatically created file (conda env export environment.yml) should be manually edited to contain the least possible list of dependencies (and let conda solve the rest on install). Otherwise, the environment.yml will be not cross-system compatible.

Anyway, this workflow solves most of my problems described above and I am kind of happy that I don't need to use pipenv or virtualenv anymore.

There're still some drawbacks,

  1. One needs to maintain dependencies in multiple files:

    • setup.py
    • environment.yml
  2. It is not possible to execute a program directly (e.g. with a shortcut) in its environment, e.g. this works without problems with pipenv run, but:
    • conda run will not automatically source activate env
    • this is an open issue and may be solved sometime
  3. cx_freeze will not correctly include global dependencies from outside conda env
  4. conda will be difficult if you need dependencies that require compilation (e.g. C-Extensions, etc.), see below or here