Assign physical interface to docker exclusively

Solution 1:

In my search I came across old solutions that invovled passing lxc-config parameters to docker, but newer versions of docker don't use the lxc tools any more, so that cannot work.

Following the suggestion here: https://groups.google.com/d/msg/docker-user/pL8wlmiuAEU/QfcoFcKI3kgJ a solution was found. I did not look into modifying the pipework script as mentioned above, instead using the required commands directly. Also see subsequent blog post: http://jason.digitalinertia.net/exposing-docker-containers-with-sr-iov/.

The following low-level (i.e. not docker specific) network namespace tool commands can be used to transfer an interface from the host to a docker container:

CONTAINER=slave-play # Name of the docker container
HOST_DEV=ethHOST     # Name of the ethernet device on the host
GUEST_DEV=test10gb   # Target name for the same device in the container
ADDRESS_AND_NET=10.101.0.5/24

# Next three lines hooks up the docker container's network namespace 
# such that the ip netns commands below will work
mkdir -p /var/run/netns
PID=$(docker inspect -f '{{.State.Pid}}' $CONTAINER)
ln -s /proc/$PID/ns/net /var/run/netns/$PID

# Move the ethernet device into the container. Leave out 
# the 'name $GUEST_DEV' bit to use an automatically assigned name in 
# the container
ip link set $HOST_DEV netns $PID name $GUEST_DEV

# Enter the container network namespace ('ip netns exec $PID...') 
# and configure the network device in the container
ip netns exec $PID ip addr add $ADDRESS_AND_NET dev $GUEST_DEV

# and bring it up.
ip netns exec $PID ip link set $GUEST_DEV up

# Delete netns link to prevent stale namespaces when the docker
# container is stopped
rm /var/run/netns/$PID

A minor caveat on the interface naming if your host has a lot of ethX devices (mine had eth0 -> eth5). E.g. say you move eth3 into the container as eth1 in the containers namespace. When you stop the container, the kernel will try to move the container's eth1 device back to the host, but notice that there is already an eth1 device. It will then rename the interface to something arbitrary; took me a while to find it again. For this reason I edited /etc/udev/rules.d/70-persistent-net.rules (I think this filename is common to most popular Linux distros; I am using Debian) to give the interface in question a unique, unmistakable name, and use that in both the container and on the host.

Since we are not using docker to do this configuration, the standard docker lifecycle tools (e.g. docker run --restart=on-failure:10 ...) cannot be used. The host machine in question runs Debian Wheezy, so I wrote the following init script:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          slave-play
# Required-Start:    $local_fs $network $named $time $syslog $docker
# Required-Stop:     $local_fs $network $named $time $syslog $docker
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Description:       some slavishness
### END INIT INFO

CONTAINER=slave-play
SCRIPT="docker start -i $CONTAINER"
RUNAS=root

LOGFILE=/var/log/$CONTAINER.log
LOGFILE=/var/log/$CONTAINER.log

HOST_DEV=test10gb
GUEST_DEV=test10gb
ADDRESS_AND_NET=10.101.0.5/24


start() {
  if [ -f /var/run/$PIDNAME ] && kill -0 $(cat /var/run/$PIDNAME); then
echo 'Service already running' >&2
return 1
  fi
  echo 'Starting service…' >&2
  local CMD="$SCRIPT &> \"$LOGFILE\" &"
  su -c "$CMD" $RUNAS 
  sleep 0.5 # Nasty hack so that docker container is already running before we do the rest
  mkdir -p /var/run/netns
  PID=$(docker inspect -f '{{.State.Pid}}' $CONTAINER)
  ln -s /proc/$PID/ns/net /var/run/netns/$PID
  ip link set $HOST_DEV netns $PID name $GUEST_DEV
  ip netns exec $PID ip addr add $ADDRESS_AND_NET dev $GUEST_DEV
  ip netns exec $PID ip link set $GUEST_DEV up
  rm /var/run/netns/$PID
  echo 'Service started' >&2
}

stop() {
  echo "Stopping docker container $CONTAINER" >&2
  docker stop $CONTAINER
  echo "docker container $CONTAINER stopped" >&2
}


case "$1" in
  start)
start
;;
  stop)
stop
;;
  restart)
stop
start
;;
  *)
echo "Usage: $0 {start|stop|restart}"
esac

Slightly hacky, but it works :)

Solution 2:

pipework can move a physical network interface from the default to the container network namespace:

    $ sudo pipework --direct-phys eth1 $CONTAINERID 192.168.1.2/24

For more information see here.


Solution 3:

I write a docker network plugin to do this.

https://github.com/yunify/docker-plugin-hostnic

docker pull qingcloud/docker-plugin-hostnic
docker run -v /run/docker/plugins:/run/docker/plugins -v /etc/docker/hostnic:/etc/docker/hostnic --network host --privileged qingcloud/docker-plugin-hostnic docker-plugin-hostnic
docker network create -d hostnic --subnet=192.168.1.0/24 --gateway 192.168.1.1 hostnic
docker run -it --ip 192.168.1.5 --mac-address 52:54:0e:e5:00:f7 --network hostnic ubuntu:14.04 bash