Effectively handling 2+ million files

You probably just want to use XFS.

It's quite capable of what you're asking for, and does the job.

There's no reason to complicate this with lesser-used filesystems, which can come with other tradeoffs.

Please see: How does the number of subdirectories impact drive read / write performance on Linux? and The impact of a high directory-to-file ratio on XFS

If you want something more esoteric, ZFS zvols with a filesystem on top could provide an interesting alternative (for compression, integrity and portability purposes).

See here: Transparent compression filesystem in conjunction with ext4


If it is read-only, why to not use a ISO file? You can use genisoimage or mkisofs.

If you want to compress the whole thing, you can also use squashfs, another read-only filesystem with very high compression ratio.


Seeing the number of small files, I would consider using SquashFS. Especially if you have powerful enough CPU (meaning no Pentium III, or 1GHz ARM).

Depending on the type of data stored, SquashFS can greatly reduce its size and thus the I/O when reading it. Only downside is CPU usage on read. On the other hand, any modern CPU can decompress at speeds far outperforming HDD and probably even SSD.

As another advantage - you save space/bandwidth and/or time spent uncompressing after transfer.

Some benchmarks comparing it to ISO and other similar means. As with every benchmark, take it with a grain of salt and better, fake your own. ;-)

Edit: depending on circumstances (and im not daring to guess here) SquashFS without compression (mksquashfs -noD) could outperform ext4, as the code for reading should be much simpler and optimized for read-only operation. But that is really up to you to benchmark in your use case. Another advantage is the SquashFS image being just a little larger than your data. With Ext4 you have to always create larger loop device. Disadvantage is, of course, that it is rather uncomfortable, when you need to change the data. That is way easier with ext4.