Is it possible to have multiple default gateways for outbound connections?

Solved it myself. There seems to be very little information about the networking stuff that you can do with Linux, so I have decided to document and explain my solution in detail. This is my final setup:

  • 3 NICs: eth0 (wire), wlan0 (built-in wifi, weak), wlan1 (usb wifi adapter, stronger signal than wlan0)
  • All of them on a single subnet, each of them with their own IP address.
  • eth0 should be used for both incoming and outgoing traffic by default.
  • If eth0 fails then wlan1 should be used.
  • If wlan1 fails then wlan0 should be used.

First step: Create a new route table for every interface in /etc/iproute2/rt_tables. Let's call them rt1, rt2 and rt3

#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1  inr.ruhep
1 rt1
2 rt2
3 rt3

Second step: Network configuration in /etc/network/interfaces. This is the main part and I'll try to explain as much as I can:

auto eth0 wlan0
allow-hotplug wlan1

iface lo inet loopback

iface eth0 inet static
address 192.168.178.99
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev eth0 src 192.168.178.99 table rt1
    post-up ip route add default via 192.168.178.1 dev eth0 table rt1
    post-up ip rule add from 192.168.178.99/32 table rt1
    post-up ip rule add to 192.168.178.99/32 table rt1
    post-up ip route add default via 192.168.178.1 metric 100 dev eth0
    post-down ip rule del from 0/0 to 0/0 table rt1
    post-down ip rule del from 0/0 to 0/0 table rt1

iface wlan0 inet static
wpa-conf /etc/wpa_supplicant.conf
wireless-essid xyz
address 192.168.178.97
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev wlan0 src 192.168.178.97 table rt2
    post-up ip route add default via 192.168.178.1 dev wlan0 table rt2
    post-up ip rule add from 192.168.178.97/32 table rt2
    post-up ip rule add to 192.168.178.97/32 table rt2
    post-up ip route add default via 192.168.178.1 metric 102 dev wlan0
    post-down ip rule del from 0/0 to 0/0 table rt2
    post-down ip rule del from 0/0 to 0/0 table rt2

iface wlan1 inet static
wpa-conf /etc/wpa_supplicant.conf
wireless-essid xyz
address 192.168.178.98
netmask 255.255.255.0
dns-nameserver 8.8.8.8 8.8.4.4
    post-up ip route add 192.168.178.0/24 dev wlan1 src 192.168.178.98 table rt3
    post-up ip route add default via 192.168.178.1 dev wlan1 table rt3
    post-up ip rule add from 192.168.178.98/32 table rt3
    post-up ip rule add to 192.168.178.98/32 table rt3
    post-up ip route add default via 192.168.178.1 metric 101 dev wlan1
    post-down ip rule del from 0/0 to 0/0 table rt3
    post-down ip rule del from 0/0 to 0/0 table rt3

If you type ip rule show you should see the following:

0:  from all lookup local 
32756:  from all to 192.168.178.98 lookup rt3 
32757:  from 192.168.178.98 lookup rt3 
32758:  from all to 192.168.178.99 lookup rt1 
32759:  from 192.168.178.99 lookup rt1 
32762:  from all to 192.168.178.97 lookup rt2 
32763:  from 192.168.178.97 lookup rt2 
32766:  from all lookup main 
32767:  from all lookup default 

This tells us that traffic incoming or outgoing from the IP address "192.168.178.99" will use the rt1 route table. So far so good. But traffic that is locally generated (for example you want to ping or ssh from the machine to somewhere else) needs special treatment (see the big quote in the question).

The first four post-up lines in /etc/network/interfaces are straightforward and explanations can be found on the internet, the fifth and last post-up line is the one that makes magic happen:

post-up ip r add default via 192.168.178.1 metric 100 dev eth0

Note how we haven't specified a route-table for this post-up line. If you don't specify a route table, the information will be saved in the main route table that we saw in ip rule show. This post-up line puts a default route in the "main" route table that is used for locally generated traffic that is not a response to incoming traffic. (For example an MTA on your server trying to send an e-mail.)

The three interfaces all put a default route in the main route table, albeit with different metrics. Let's take a look a the main route table with ip route show:

default via 192.168.178.1 dev eth0  metric 100 
default via 192.168.178.1 dev wlan1  metric 101 
default via 192.168.178.1 dev wlan0  metric 102 
192.168.178.0/24 dev wlan0  proto kernel  scope link  src 192.168.178.97 
192.168.178.0/24 dev eth0  proto kernel  scope link  src 192.168.178.99 
192.168.178.0/24 dev wlan1  proto kernel  scope link  src 192.168.178.98

We can see that the main route table has three default routes, albeit with different metrics. The highest priority is eth0, then wlan1 and then wlan0 because lower metric numbers indicate a higher priority. Since eth0 has the lowest metric this is the default route that is going to be used for as long as eth0 is up. If eth0 goes down, outgoing traffic will switch to wlan1.

With this setup we can type ping 8.8.8.8 in one terminal and ifdown eth0 in another. ping should still work because because ifdown eth0 will remove the default route related to eth0, outgoing traffic will switch to wlan1.

The post-down lines make sure that the related route tables get deleted from the routing policy database (ip rule show) when the interface goes down, in order to keep everything tidy.

The problem that is left is that when you pull the plug from eth0 the default route for eth0 is still there and outgoing traffic fails. We need something to monitor our interfaces and to execute ifdown eth0 if there's a problem with the interface (i.e. NIC failure or someone pulling the plug).

Last step: enter ifplugd. That's a daemon that watches interfaces and executes ifup/ifdown if you pull the plug or if there's problem with the wifi connection /etc/default/ifplugd:

INTERFACES="eth0 wlan0 wlan1"
HOTPLUG_INTERFACES=""
ARGS="-q -f -u0 -d10 -w -I"
SUSPEND_ACTION="stop"

You can now pull the plug on eth0, outgoing traffic will switch to wlan1 and if you put the plug back in, outgoing traffic will switch back to eth0. Your server will stay online as long as any of the three interfaces work. For connecting to your server you can use the ip address of eth0 and if that fails, the ip address of wlan1 or wlan0.


Linux provides a better solution than your scripted workaround: active-backup bonding.

This way your machine will have only one ip address (and one mac address) and automatically and transparently switch interfaces if one interface becomes unavailable. No disruption of any TCP connection (neither to your internal lan nor to the internet).

I'm using this setup myself to automatically failover from eth0 to wlan0 on my debian laptop when I disconnect my laptop from the docking station.

My /etc/network/interfaces:

# The primary network interface
allow-hotplug eth0
iface eth0 inet manual
        bond-master bond0
        bond-primary eth0

# The secondary network interface
allow-hotplug wlan0
iface wlan0 inet manual
        pre-up sleep 5
        wpa-conf /etc/wpa_supplicant.conf
        bond-master bond0
        bond-primary eth0

# The bonding interface
allow-hotplug bond0
iface bond0 inet dhcp
        bond-slaves eth0 wlan0
        bond-primary eth0
        bond-mode active-backup
        bond-miimon 10
        bond_downdelay 10
        bond_updelay 4000

You can easily extend this setup to include multiple wlan devices. Setting the primary_reselect option to better (automatically select the fastest link) should help here.

For more information see https://wiki.linuxfoundation.org/networking/bonding and https://wiki.debian.org/Bonding

And (of course) the linux kernel documentation at https://www.kernel.org/doc/Documentation/networking/bonding.txt