How does conda work internally?

I'm no expert on the software, but I have been using conda to maintain an internal repository for several months, so I can share the insight of an "advanced user." There are a lot of questions here, so I'll try to answer them in order.

How does conda (http://conda.pydata.org) work internally? Any details are welcome...

The most concise reference I can share is the conda-build doc, which explains conda recipes in detail.

TL;DR Recipes are folders with a config file meta.yaml that describes the package in terms of name, version, source location, dependencies (build, test, run), and basic tests to run after installation. It also contains build script(s) (build.sh and/or bld.bat for linux and win, respectively), which execute any build steps other than downloading the source.

Installation consists (in short) of downloading the source, creating a build environment, building, creating a test environment, and testing. You can install something system-wide or install it in an environment:

conda install -n myenv mypkg # install only in myenv
conda install mypkg # install globally

Activating an environment works exactly the same as with virtualenv:

source activate myenv

What are the restrictions of using only conda as package manager? Would it work?

It would work. You can install anything you want with conda, if you have a recipe that supports your environment. The issue you will run into is package support. Conda maintainers and users have created an ecosystem of packages on various channels, but support for binary packages is pretty much limited to those that are commonly needed by Python packages, and many of these are only supported on one or two platforms. apt, yum, etc. users maintain all kinds of stuff for their respective platforms.

In our case, we need to support Ubuntu and OSX, so we maintain many platform-dependent binary packages through puppet and other foolish sorcery, and we use conda to maintain Python packages for the two platforms. If conda packages existed for all the binary packages we use, I might consider using conda instead of apt, brew, etc., but I would risk taking on significant recipe maintenance if the recipes we used became outdated. This is fine for us in the case of Python package management, where conda fills a huge void, but I'm not ready to take that on for packages that we have existing tools to maintain. We'll see if my thinking changes as the conda ecosystem matures. One tool to rule them all would be nice, but I don't think conda is ready for me to make that jump.

Does it use some kind of containerization, or static linking of all the dependencies, why is it so "cross platform"?

"Cross-platform" can have many meanings. For Python packages, cross-platform means you can create environments with any version of python and the packages you need. For Linux/win flavors and distros, you can do as much as you want in your build script based on the environment. As an example, take a look at the conda build script for qt. It has appropriate installations for OSX and Linux. The script can do whatever it wants though. You can switch based on OS version or whatever you want. Many recipes will simply fail if they do not support the installation platform.

Hope you found this helpful.


I see that no one who really understands how conda works is willing to share their knowledge. This is unfortunate...

I can offer high level sequence of conda build actions:

  1. Looks at meta.yaml to find RUN + BUILD deps
  2. Creates new env called: '_build'
    1. Installs RUN + BUILD dependencies from meta.yaml
  3. Fetches source code to conda_bld/work
  4. Builds package:
    1. take 'snapshot1' of full environment
    2. home_dir: 'conda_bld/work', run: 'sh build.sh' 'setup.py install conda_bld/work' (installs localy to _build env)
    3. take 'snapshot2' of full environment
    4. Package contents are 'diff snapshot1 snapshot2'
  5. Runs tests:
    1. Creates '_test' ENV with '{just built package, RUN DEPS}'
    2. runs tests

This with @asmeurer youtube link on the side should get you started.


I explain a lot of this in my SciPy 2014 talk. Let me give a little outline here.

First off, a conda package is really simple. It is just a tarball of the files that are to be installed, along with some metadata in an info directory. For example the conda package for python is a tarball of the files

info/
    files
    index.json
    ...
bin/
    python
    ...
lib/
    libpython.so
    python2.7/
        ...
    ...
...

You can see exactly what it looks like by looking at the extracted packages in the Anaconda pkgs directory. The full spec is at https://docs.conda.io/projects/conda-build/en/latest/source/package-spec.html.

When conda installs this, it extracts the tarball to the pkgs directory and hard links the files into the installation environment. Finally, some files that have some hard coded installation paths have this replaced (usually shebang lines).

That's basically it. There is some more stuff that happens in terms of dependency resolution, but once it knows what packages its going to install that's how it does it.

The process of building a package is a little more complicated. @mattexx's answer and the document it links to describes a bit of the canonical way of building a package using conda build.

To answer your other questions:

Furthermore, as it is python agnostic and apparently work so well and fluently, why is it not used as a general purpose package manager like apt or yum?

You certainly can. The only thing limiting this are the set of packages that have been built for conda. On Windows, this is a very nice option, as there aren't any system package managers like there are on Linux.

What are the restrictions of using only conda as package manager? Would it work?

It would work, assuming you have conda packages for everything you are interested in. The main restriction is that conda only wants to install things into the conda environment itself, so things that require specific installation locations on the system might not be well suited to conda (although it's still doable, if you set that location as your environment path). Or for instance, conda might not be a suitable replacement for "project level" package managers like bower.

Also, conda probably shouldn't be used to manage system level libraries (libraries that must be installed in the / prefix), like kernel extensions or the kernel itself, unless you were to build out a distribution that uses conda as a package manager explicitly.

The main thing I will say about these things is that conda packages are generally made to be relocatable, meaning the installation prefix of the package does not matter. This is why hard coded paths are changed as part of the install process, for instance. It also means that dynamic libraries built with conda build will have their RPATHs (on Linux) and install names (on OS X) changed automatically to use relative paths instead of absolute ones.

Or the other way round, why are e.g. apt and yum not able to provide the functionality conda provides? Is conda "better" than those package manager or just different?

In some ways it's better, and in some ways it's not. Your system package manager knows your system, and there are packages in there that are not going to be in conda (and some, like the kernel, that probably shouldn't be in conda).

The main advantage of conda is its notion of environments. Since packages are made to be relocatable, you can install the same package in multiple places, and effectively have completely independent installs of everything, basically for free.

Does it use some kind of containerization

No, the only "containerization" is having separate install directories and making packages relocatable.

or static linking of all the dependencies,

The dependency linking is completely up to the package itself. Some packages statically link their dependencies, some don't. The dynamically linked libraries have their load paths changed as I described above to be relocatable.

why is it so "cross platform"?

"Cross platform" in this case means "cross operating system". Although the same binary package can't work across OS X, Linux, and Windows, the point is that conda itself works identically on all three, so if you have the same packages built for all three platforms, you can manage them all the same way regardless of which one you are on.