Both servers running keepalived become master and have a same Virtual IP

Solution 1:

Packets are not passing between machines on the em1 interface (causing a split brain scenario as Mike states).

  • check your firewall to ensure packets aren't being caught
  • check your networking to ensure em1 is the same network on both machines

Here's an example of what one of the packets looks like:

Frame 2: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
    Arrival Time: Jun  1, 2013 03:39:50.709520000 UTC
    Epoch Time: 1370057990.709520000 seconds
    [Time delta from previous captured frame: 0.000970000 seconds]
    [Time delta from previous displayed frame: 0.000970000 seconds]
    [Time since reference or first frame: 0.000970000 seconds]
    Frame Number: 2
    Frame Length: 54 bytes (432 bits)
    Capture Length: 54 bytes (432 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ip:vrrp]
Ethernet II, Src: 00:25:90:83:b0:07 (00:25:90:83:b0:07), Dst: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
    Destination: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
        Address: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 00:25:90:83:b0:07 (00:25:90:83:b0:07)
        Address: 00:25:90:83:b0:07 (00:25:90:83:b0:07)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol Version 4, Src: 10.0.10.11 (10.0.10.11), Dst: 224.0.0.18 (224.0.0.18)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
    Total Length: 40
    Identification: 0x8711 (34577)
    Flags: 0x00
        0... .... = Reserved bit: Not set
        .0.. .... = Don't fragment: Not set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 255
    Protocol: VRRP (112)
    Header checksum: 0x4037 [correct]
        [Good: True]
        [Bad: False]
    Source: 10.0.10.11 (10.0.10.11)
    Destination: 224.0.0.18 (224.0.0.18)
Virtual Router Redundancy Protocol
    Version 2, Packet type 1 (Advertisement)
        0010 .... = VRRP protocol version: 2
        .... 0001 = VRRP packet type: Advertisement (1)
    Virtual Rtr ID: 254
    Priority: 151 (Non-default backup priority)
    Addr Count: 1
    Auth Type: No Authentication (0)
    Adver Int: 1
    Checksum: 0x3c01 [correct]
    IP Address: 10.0.0.254 (10.0.0.254)

Solution 2:

For my case I had to allow multicast traffic through the firewall to 224.0.0.18, for ufw:

ufw allow from 224.0.0.18
ufw allow to 224.0.0.18

This helped me.


Solution 3:

In my case, for CentOS/RHEL 8 I only had to allow firewall rich-rule for vrrp protocol for solving this Keepalived split-brain issue where both the servers held the VIP IP address. I had to add sysctl kernel flag for allowing HAProxy to bind to nonlocal VIP IP.

For sysctl, add net.ipv4.ip_nonlocal_bind = 1 in /etc/sysctl.conf file and then do a sysctl -p for reloading the sysctl config. I needed this NOT for the Keepalived split-brain scenario but for having HAProxy bind to its own IP address for stats (ex: bind 192.168.0.10:1492/stats) and bind to the VIP (virtual IP) address for load-balancing web traffic (bind 192.168.0.34:80 and bind 192.168.0.34:443). Otherwise, HAProxy service failed to start stating it cannot bind to port 80 and 443 with the VIP IP address only. I was doing this to avoid having bind *:80 and bind *:443. Also, feels like a no-brainer but easily overlooked, check to see if you have allowed the port you are using for stats through the firewall if you are not able to reach the stats page.

For the firewall, execute the following commands:

# firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
# firewall-cmd --reload

I found these flags and other information directly from RedHat documentation for HAProxy and Keepalived:

Firewall reference: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/load_balancer_administration/s1-lvs-connect-vsa

Nonlocal bind flag reference (this was used for HAProxy though as I was not using Keepalived for load-balancing): https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/load_balancer_administration/s1-initial-setup-forwarding-vsa

Just HAProxy FYI: If HAProxy still fails to bind to ports, you might want to look at the good ol' SELinux blocking it. For me, on CentOS 8 I had to do a semanage port -a -t http_port_t -p tcp 1492 for my HAProxy stats page.