Configuring NTFS file system for performance

Solution 1:

Disable last access time stamp and reserve space for the MFT.

  • NTFS Performance Hacks
  • Disable the NTFS Last Access Time Stamp

Solution 2:

I would also add:

Turn off disk defragmentation. Change block size to 16kb so each file is written into a single block.

Rational for this:

You are wanting to write 1.7GB of data a day, in 200,000 files. Assuming that these files are writen over a 24 hour day, this means around 3 files a second. This does not seem to be a significant problem for a single SATA disk so my guess is that you have other problems as well as disk performance.

(i.e. do you have enough memory? or are you paging memory to disk as well?)

However

  1. Windows NTFS file systems by default attempts to defragment file systems in the background. Disk defragmentation will kill performance whilst you are defragmenting the disk. Since performance seems to already be an issue, this will only be making matters worse for you.

  2. There is a balance between using small cluster sizes and IO performance in writing large files. Files and the file allocation table will not be on the same sector on the disk, so having to allocated blocks as you are writing files will cause the disk head to have to constantly move around. Using a cluster size that is capable of storing 95% of your files in one cluster each, will improve your IO write performance.

  3. As other people have pointed out, using a tiny cluster size of 2k will cause fragmentation over time. Think of it like this, during the first 18 months you will be writing files into clean empty disk, but the OS doesnt know that once closed, no more data will be added to each file, so it has been leaving some blocks available at the end each file incase that file is extended later. Long before you fill the disk, you will find that the only free space is in gaps between other files. Not only that, when its selecting a gap for your file, the os does not know if you are writing a 5 block file or a 2 block file, so it can't make a good choices on where to save your file.

At the end of the day, engineering is about handling conflicting needs, and choosing the lowest cost solution to these balancing needs. My guess is that buying a larger hard drive is probably cheaper than buying faster hard drives.


Solution 3:

To elaborate on my comment on Ptolemy's answer...

By setting up your block size so a very large majority of every file is contained within one block, you do get I/O efficiencies. With a 2K block size and a 8.5K average file-size, 50% of your I/O operations will be to 5 blocks or more. By setting a 16K block size, it sounds like the very large majority of writes would be to a single block; which would make those 3% of reads much more efficient when they happen.

One thing to consider is backup I/O. If you are backing up the data, every file will get read at least once, and their directory entries will be trolled every backup pass. If you are going to back this up, please consider backup I/O in your designs.

Caveats: if your underlaying storage system is one that already does some storage virtualization (such as an HP EVA disk-array, or other arrays of that class) then this doesn't matter so much. Fragmentation of this type will not be noticed as the data already physically exists in a highly fragmented nature on the actual drives. In that case, the 2k block size is just fine and won't affect performance as much. There will still be performance gains by selecting a block size large enough to hold a majority of your expected file-sizes, but the magnitude won't be as significant.


Solution 4:

Late for this party, but might benefit others, so...

Re. cluster size, first and most important, you'd need to look at the distribution of file sizes, so you could optimize for both low fragmentation and disk space waste so you'd resize clusters close to this size, not overall avg - e.g.: if most files fall near 2k, a 2k cluster size would be optimal, if near 4k, then a 4k cluster would be optimal, and so forth; if otoh file sizes are evenly/randomly distributed, then best you could do is go with close to avg file size for cluster size, or store files in partitions with different cluster sizes for different file sizes, like some larger systems do, but you'd need software/fs support for that.