DRBD terrible sync performance on 10GigE

Solution 1:

In newer versions of DRBD (8.3.9 and newer) there is a dynamic resync controller that needs tuning. In older versions of DRBD setting the syncer {rate;} was enough; now it's used more as a lightly suggested starting place for the dynamic resync speed.

The dynamic sync controller is tuned with the "c-settings" in the disk section of DRBD's configuration (see $ man drbd.conf for details on each of these settings).

With 10Gbe between these nodes, and assuming low latency since protocol C is used, the following config should get things moving quicker:

resource rd0 {
        protocol C;
        disk {
                c-fill-target 10M;
                c-max-rate   700M;
                c-plan-ahead    7;
                c-min-rate     4M;
        }
        on cl1 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.1:7788;
                meta-disk internal;
        }

        on cl2 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.2:7788;
                meta-disk internal;
        }
}

If you're still not happy, try turning max-buffers up to 12k. If you're still not happy, you can try turning up c-fill-target in 2M increments.

Solution 2:

Someone elsewhere suggested that I use these settings:

        disk {
                on-io-error             detach;
                c-plan-ahead 0;
        }
        net {
                max-epoch-size          20000;
                max-buffers             131072;
        }

And the performance is excellent.

Edit: As per @Matt Kereczman and others suggestions, I've finally changed to this:

disk {
        on-io-error             detach;
        no-disk-flushes ;
        no-disk-barrier;
        c-plan-ahead 0;
        c-fill-target 24M;
        c-min-rate 80M;
        c-max-rate 720M;
} 
net {
        # max-epoch-size          20000;
        max-buffers             36k;
        sndbuf-size            1024k ;
        rcvbuf-size            2048k;
}

Resync speed is high:

cat /proc/drbd
version: 8.4.5 (api:1/proto:86-101)
srcversion: EDE19BAA3D4D4A0BEFD8CDE
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:133246146 nr:0 dw:2087494 dr:131187797 al:530 bm:0 lo:0 pe:5 ua:106 ap:0 ep:1 wo:d oos:4602377004
        [>....................] sync'ed:  2.8% (4494508/4622592)M
        finish: 1:52:27 speed: 682,064 (646,096) K/sec

Write speed is excellent during resync with these settings (80% of local write speed, full wire speed):

# dd if=/dev/zero of=./testdd bs=1M count=20k
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,3731 s, 731 MB/s

Read speed is OK:

# dd if=testdd bs=1M count=20k of=/dev/null
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,4538 s, 729 MB/s

Later edit:

After a full resync, the performance is very good ( wire speed writing, local speed reading). Resync is quick (5/6 hours) and doesn't hurt performance too much (wire speed reading, wire speed writing). I'll definitely stay with c-plan-ahead at zero. With non-zero values, resync is way too long.


Solution 3:

c-plan-ahead have to set a positive value to enable dynamic sync rate controller. disk c-plan-ahead 15; // 5 * RTT / 0.1s unit,in my case is 15 c-fill-target 24; c-max-rate 720M;