Why is ZFS on Linux unable to fully utilize 8x SSDs on AWS i2.8xlarge instance?

Solution 1:

This setup may not be tuned well. There are parameters needed for both the /etc/modprobe/zfs.conf file and the ashift value when using SSDs

Try ashift=12 or 13 and test again.


Edit:

This is still a virtualized solution, so we don't know too much about the underlying hardware or how everything is interconnected. I don't know that you'll get better performance out of this solution.


Edit:

I guess I don't see the point of trying to optimize a cloud instance in this manner. Because if top performance were the aim, you'd be using hardware, right?

But remember that ZFS has a lot of tunable settings, and what you get by default isn't anywhere close to your use case.

Try the following in your /etc/modprobe.d/zfs.conf and reboot. It's what I use in my all-SSD data pools for application servers. Your ashift should be 12 or 13. Benchmark with compression=off, but use compression=lz4 in production. Set atime=off. I'd leave recordsize as default (128K).

options zfs zfs_vdev_scrub_min_active=48
options zfs zfs_vdev_scrub_max_active=128
options zfs zfs_vdev_sync_write_min_active=64
options zfs zfs_vdev_sync_write_max_active=128
options zfs zfs_vdev_sync_read_min_active=64
options zfs zfs_vdev_sync_read_max_active=128
options zfs zfs_vdev_async_read_min_active=64
options zfs zfs_vdev_async_read_max_active=128
options zfs zfs_top_maxinflight=320
options zfs zfs_txg_timeout=30
options zfs zfs_dirty_data_max_percent=40
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_async_write_min_active=8
options zfs zfs_vdev_async_write_max_active=64
options zfs zfs_prefetch_disable=1

Solution 2:

It seems likely that you're waiting on a Linux kernel mutex lock that in turn may be waiting on a Xen ring buffer. I can't be certain of this without access to a similar machine, but I'm not interested in paying Amazon $7/hour for that privilege.

Longer write-up is here: https://www.reddit.com/r/zfs/comments/4b4r1y/why_is_zfs_on_linux_unable_to_fully_utilize_8x/d1e91wo ; I'd rather it be in one place than two.