How to deal with persistent storage (e.g. databases) in Docker

In Docker release v1.0, binding a mount of a file or directory on the host machine can be done by the given command:

$ docker run -v /host:/container ...

The above volume could be used as a persistent storage on the host running Docker.


Docker 1.9.0 and above

Use volume API

docker volume create --name hello
docker run -d -v hello:/container/path/for/volume container_image my_command

This means that the data-only container pattern must be abandoned in favour of the new volumes.

Actually the volume API is only a better way to achieve what was the data-container pattern.

If you create a container with a -v volume_name:/container/fs/path Docker will automatically create a named volume for you that can:

  1. Be listed through the docker volume ls
  2. Be identified through the docker volume inspect volume_name
  3. Backed up as a normal directory
  4. Backed up as before through a --volumes-from connection

The new volume API adds a useful command that lets you identify dangling volumes:

docker volume ls -f dangling=true

And then remove it through its name:

docker volume rm <volume name>

As @mpugach underlines in the comments, you can get rid of all the dangling volumes with a nice one-liner:

docker volume rm $(docker volume ls -f dangling=true -q)
# Or using 1.13.x
docker volume prune

Docker 1.8.x and below

The approach that seems to work best for production is to use a data only container.

The data only container is run on a barebones image and actually does nothing except exposing a data volume.

Then you can run any other container to have access to the data container volumes:

docker run --volumes-from data-container some-other-container command-to-execute
  • Here you can get a good picture of how to arrange the different containers.
  • Here there is a good insight on how volumes work.

In this blog post there is a good description of the so-called container as volume pattern which clarifies the main point of having data only containers.

Docker documentation has now the DEFINITIVE description of the container as volume/s pattern.

Following is the backup/restore procedure for Docker 1.8.x and below.

BACKUP:

sudo docker run --rm --volumes-from DATA -v $(pwd):/backup busybox tar cvf /backup/backup.tar /data
  • --rm: remove the container when it exits
  • --volumes-from DATA: attach to the volumes shared by the DATA container
  • -v $(pwd):/backup: bind mount the current directory into the container; to write the tar file to
  • busybox: a small simpler image - good for quick maintenance
  • tar cvf /backup/backup.tar /data: creates an uncompressed tar file of all the files in the /data directory

RESTORE:

# Create a new data container
$ sudo docker run -v /data -name DATA2 busybox true
# untar the backup files into the new container᾿s data volume
$ sudo docker run --rm --volumes-from DATA2 -v $(pwd):/backup busybox tar xvf /backup/backup.tar
data/
data/sven.txt
# Compare to the original container
$ sudo docker run --rm --volumes-from DATA -v `pwd`:/backup busybox ls /data
sven.txt

Here is a nice article from the excellent Brian Goff explaining why it is good to use the same image for a container and a data container.


As of Docker Compose 1.6, there is now improved support for data volumes in Docker Compose. The following compose file will create a data image which will persist between restarts (or even removal) of parent containers:

Here is the blog announcement: Compose 1.6: New Compose file for defining networks and volumes

Here's an example compose file:

version: "2"

services:
  db:
    restart: on-failure:10
    image: postgres:9.4
    volumes:
      - "db-data:/var/lib/postgresql/data"
  web:
    restart: on-failure:10
    build: .
    command: gunicorn mypythonapp.wsgi:application -b :8000 --reload
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    links:
      - db

volumes:
  db-data:

As far as I can understand: This will create a data volume container (db_data) which will persist between restarts.

If you run: docker volume ls you should see your volume listed:

local               mypthonapp_db-data
...

You can get some more details about the data volume:

docker volume inspect mypthonapp_db-data
[
  {
    "Name": "mypthonapp_db-data",
    "Driver": "local",
    "Mountpoint": "/mnt/sda1/var/lib/docker/volumes/mypthonapp_db-data/_data"
  }
]

Some testing:

# Start the containers
docker-compose up -d

# .. input some data into the database
docker-compose run --rm web python manage.py migrate
docker-compose run --rm web python manage.py createsuperuser
...

# Stop and remove the containers:
docker-compose stop
docker-compose rm -f

# Start it back up again
docker-compose up -d

# Verify the data is still there
...
(it is)

# Stop and remove with the -v (volumes) tag:

docker-compose stop
docker=compose rm -f -v

# Up again ..
docker-compose up -d

# Check the data is still there:
...
(it is).

Notes:

  • You can also specify various drivers in the volumes block. For example, You could specify the Flocker driver for db_data:

    volumes:
      db-data:
        driver: flocker
    
  • As they improve the integration between Docker Swarm and Docker Compose (and possibly start integrating Flocker into the Docker eco-system (I heard a rumor that Docker has bought Flocker), I think this approach should become increasingly powerful.

Disclaimer: This approach is promising, and I'm using it successfully in a development environment. I would be apprehensive to use this in production just yet!