What does apt-get install do under the hood?

Mostly, apt-get does the following things:

  • checks for dependencies (and asks to install them),
  • downloads the package, verifies it and then tells dpkg to install it.

dpkg will:

  • extract the package and copy the content to the right location, and check for pre-existing files and modifications on them,
  • run package maintainer scripts: preinst, postinst, (and prerm, postrm before these, if a package is being upgraded)
  • execute some actions based on triggers

You might be interested in the maintainer scripts, which are usually located at /var/lib/dpkg/info/<package-name>.{pre,post}{rm,inst}. These are usually shell scripts, but there's no hard rule. For example:

$ ls /var/lib/dpkg/info/xml-core.{pre,post}{rm,inst}
/var/lib/dpkg/info/xml-core.postinst
/var/lib/dpkg/info/xml-core.postrm
/var/lib/dpkg/info/xml-core.preinst
/var/lib/dpkg/info/xml-core.prerm

In short: apt-get install does everything that is needed that your system can successfully execute the new installed software application.

Longer:

Preliminaries:

From the manpage:

All packages required by the package(s) specified for installation will also be retrieved and installed.

Those packages are stored on a repository in the network. So, apt-get downloads all the needed ones into a temporary directory (/var/cache/apt/archives/). They will be downloaded from a web- or a ftp-server. They are specified in the so called sources.list; a list of repositories. From then on they get installed one by one procedurally.

The first ones are the ones, that have no further dependencies; so no other package has to be installed for them. Through that, other packages (that had dependencies previously) have now no dependencies anymore. The system keeps doing that process over and over until the specified packages are installed.

Each package undergoes an installation procedure.

Package installation:

In Debian-based Linux distributions, as Ubuntu, those packages are in a specified standardized format called: deb - The Debian binary package format.

Such a package contains the files to be installed on the system. Also they contain a control file. That file contains scripts that the packaging system should execute in a specific situation; the so called maintainer scripts. Those scripts are split in:

  • preinst: before the installation of the files into the systems filehierarchy
  • postinst: after the installation
  • prerm: before the uninstallation
  • postrm: after the uninstallation

There is an interesting picture, showing the procedure of an installation of a new package:

installation

There are also more control-files, the most important are as follows:

  • control: A list of the dependencies, and other useful information to identify the package
  • conffiles: A list of config files (usually those in /etc)
  • debian-binary: contains the deb-package version, currently 2.0
  • md5sums: A list of md5sums of each file in the package for verifying
  • templates: A file with error descriptions and dialogs during installation

For the actual under-the-hood stuff, you'll need to grab the Apt source. Fairly simple if you have source repositories enabled:

apt-get source apt

The apt-get command itself lives in cmdline/apt-get.cc. It's a pain to read through but most of apt-get's actions are spelled out quite extensively in there. Installation however, is mapped through a DoInstall function which lives in apt-private/private-install.{cc,h}.

You have to remember that apt-get is merely one side of the coin.
dpkg is handling the actual installation but DoInstall doesn't know about dpkg directly. apt-get is actually surprisingly package-manager agnostic. All the functionality is abstracted through apt-pkg/package-manager.cc

I'm only looking briefly but even there I can't see where this actually attaches to the dpkg systems. Some of this seems to be autoconfigured through apt-pkg/aptconfiguration.cc but this is a deep well. You could spend days unravelling this.

The source documentation is good though. You could do worse things than to go through each file and read the header to work out what's actually happening.