Readahead Settings for LVM, Device-Mapper, Software Raid and Block Devices - what wins?

Solution 1:

How does the RA setting get passed down the virtual block device chain?

It depends. Let's assume you are inside Xen domU and have RA=256. Your /dev/xvda1 is actual LV on the dom0 visible under /dev/dm1. So you have RA(domU(/dev/xvda1)) = 256 and RA(dom0(/dev/dm1)) = 512 . It will have such effect that dom0 kernel will access /dev/dm1 with another RA than domU's kernel. Simple as that.

Another sittutation will occur if we assume /dev/md0(/dev/sda1,/dev/sda2) sittuation.

blockdev --report | grep sda
rw   **512**   512  4096          0   1500301910016   /dev/sda
rw   **512**   512  4096       2048      1072693248   /dev/sda1
rw   **512**   512  4096    2097152   1499227750400   /dev/sda2
blockdev --setra 256 /dev/sda1
blockdev --report | grep sda
rw   **256**   512  4096          0   1500301910016   /dev/sda
rw   **256**   512  4096       2048      1072693248   /dev/sda1
rw   **256**   512  4096    2097152   1499227750400   /dev/sda2

Setting /dev/md0 RA won't affect /dev/sdX blockdevices.

rw   **256**   512  4096       2048      1072693248   /dev/sda1
rw   **256**   512  4096    2097152   1499227750400   /dev/sda2
rw   **512**   512  4096          0      1072627712   /dev/md0

So generally in my opinion kernel accesses blockdevice in the manner that is actually set. One logical volume can be accessed via RAID (that it's part of) or devicemapper device and each with another RA that will be respected.

So the answer is - the RA setting is IMHO not passed down the blockdevice chain, but whatever the top level device RA setting is, will be used to access the constituent devices

Does dm-0 trump all because that is the top level block device you are actually accessing?

If you mean deep propagation by "trump all" - as per my previous comment I think that you may have different RA's for different devices in the system.

Would lvchange -r have an impact on the dm-0 device and not show up here?

Yes but this is a particular case. Let's assume that we have /dev/dm0 which is LVM's /dev/vg0/blockdevice. If you do:

lvchange -r 512 /dev/vg0/blockdevice

the /dev/dm0 will also change because /dev/dm0 and /dev/vg0/blockdevice is exactly the same block device when it comes to kernel access.

But let's assume that /dev/vg0/blockdevice is the same as /dev/dm0 and /dev/xvda1 in Xen domU that is utilizing it. Setting the RA of /dev/xvda1 will take effect but dom0 will see still have it's own RA.

What do you use, equivalent to the sector size above to determine the actual readahead value for a virtual device:

I typically discover RA by experimenting with different values and testing it with hdparm .

The stripe size of the RAID (for md0)?

Same as above.

Does the FS play a part (I am primarily interested in ext4 and XFS)?

Sure - this is a very big topic. I recommend You start here http://archives.postgresql.org/pgsql-performance/2008-09/msg00141.php

Solution 2:

Know the answer harder to explain so I will do so in example. Say for the sake of this you have 3 block devices an you set your RA to say 4 (4*512 byte) assuming standard sector. If you were to say use a RAID-5 scheme using the 3 disks, any read that even touched a stripe on a unique disk would compound the RA by the factor you initially set block device RA to. So if your read spanned exactly all 3 disks then your effective RA would be 12*512 byte. This can be compounded by settin RA in the various levels, eg MD or LVM. As a rule of thumb, if my app benefits from RA I set it on the highest layer possible so I dont compound the RA unnecessarrily. I then start the filesystem on sector 2049 and offset each sector start on a number divisible by 8. I may be way off on what you are asking but this is my 2¢.