How much space to leave free on HDD or SSD?

Though I can't talk about "research" being published by "peer reviewed journals" - and I wouldn't want to have to rely on those for day-to-day work - I can though talk about the realities of hundreds of production servers under a variety of OSes over many years:

There are three reasons why a full disk reduces performance:

  • Free space starvation: Think of temp files, Updates, etc.
  • File system degradation: Most file systems suffer in their ability to optimally lay out files if not enough room is present
  • Hardware level degradation: SSDs and SMR disks without enough free space will show decreased throughput and - even worse - increased latency (sometimes by many orders of magnitude)

The first point is trivial, especially since no sane production system would ever use swap space in dynamically expanding and shrinking files.

The second point differs highly between file systems and workload. For a Windows system with mixed workload, a 70% threshold turns out to be quite usable. For a Linux ext4 file system with few but big files (e.g. video broadcast systems), this might go up to 90+%.

The third point is hardware and firmware dependent, but especially SSDs with a Sandforce controller can fall back in free-block erasure on high-write workloads, leading to write latencies going up by thousands of percent. We usually leave 25% free on the partition level, then observe a fill rate of below 80%.

Recommendations

I realize that I mentioned how to make sure a max fill rate is enforced. Some random thoughts, none of them "peer reviewed" (paid, faked or real) but all of them from production systems.

  • Use filesystem boundaries: /var doesn't belong into the root file system.
  • Monitoring, monitoring, monitoring. Use a ready-made solution if it fits you, else parse the output of df -h and let alarm bells go of in case. This can save you from 30 kernels on a root fs with automatic-upgrades installed and running without the autoremove option.
  • Weigh the potential disruption of a fs overflow against the cost of making it bigger in the first place: If you are not on an embedded device, you might just double those 4G for root.

Has there been any research...into either the percentage or absolute amount of free space required by specific combinations of operating systems, filesystem, and storage technology...?

In 20 years of system administration, I've never encountered research detailing the free space requirements of various configurations. I suspect this is because computers are so variously configured it would be difficult to do because of the sheer number of possible system configurations.

To determine how much free space a system requires, one must account for two variables:

  1. The minimum space required to prevent unwanted behavior, which itself may have a fluid definition.

    Note that it's unhelpful to define required free space by this definition alone, as that's the equivalent of saying it's safe to drive 80 mph toward a brick wall until the very point at which you collide with it.

  2. The rate at which storage is consumed, which dictates an additional variable amount of space up be reserved, lest the system degrade before the admin has time to react.

The specific combination of OS, filesystems, underlying storage architecture, along with application behavior, virtual memory configuration, etc. creates quite the challenge to one wishing to provide definitive free space requirements.

That's why there are so many "nuggets" of advice out there. You'll notice that many of them make a recommendation around a specific configuration. For example, "If you have an SSD that's subject to performance issues when nearing capacity, stay above 20% free space."

Because there is no simple answer to this question, the correct approach to identify your system's minimum free space requirement is to consider the various generic recommendations in light of your system's specific configuration, then set a threshold, monitor it, and be willing to adjust it as necessary.

Or you could just keep at least 20% free space. Unless of course you have a 42 TB RAID 6 volume backed by a combination of SSDs and traditional hard disks and a pre-allocated swap file... (that's a joke for the serious folks.)


Has there been any research, preferably published in a peer-reviewed journal […]?

One has to go back a lot further than 20 years, of system administration or otherwise, for this. This was a hot topic, at least in the world of personal computer and workstation operating systems, over 30 years ago; the time when the BSD people were developing the Berkeley Fast File System and Microsoft and IBM were developing the High Performance File System.

The literature on both by its creators discusses the ways that these filesystems were organized so that the block allocation policy yielded better performance by trying to make consecutive file blocks contiguous. You can find discussions of this, and of the fact that the amount and location of free space left to allocate blocks affects block placement and thus performance, in the contemporary articles on the subject.

It should be fairly obvious, for example, from the description of the block allocation algorithm of the Berkeley FFS that, if there is no free space in the current and secondary cylinder group and the algorithm thus reaches the fourth level fallback ("apply an exhaustive search to all cylinder groups"), performance of allocating disc blocks will suffer as also will fragmentation of the file (and hence read performance).

It is these and similar analyses (these being far from the only filesystem designs that aimed to improve on the layout policies of the filesystem designs of the time) that the received wisdom of the past 30 years has built upon.

For example: The dictum in the original paper that FFS volumes be kept less than 90% full, lest performance suffer, which was based upon experiments made by the creators, can be found uncritically repeated even in books on Unix filesystems published this century (e.g., Pate2003 p. 216). Few people question this, although Amir H. Majidimehr actually did the century before, saying that xe has in practice not observed a noticeable effect; not least because of the customary Unix mechanism that reserves that final 10% for superuser use, meaning that a 90% full disc is effectively 100% full for non-superusers anyway (Majidimehr1996 p. 68). So did Bill Calkins, who suggests that in practice one can fill up to 99%, with 21st century disc sizes, before observing the performance effects of low free space because even 1% of modern size discs is enough to have lots of unfragmented free space still to play with (Calkins2002 p. 450).

This latter is an example of how received wisdom can become wrong. There are other examples of this. Just as the SCSI and ATA worlds of logical block addressing and zoned bit recording rather threw out of the window all of the careful calculations of rotational latency in the BSD filesystem design, so the physical mechanics of SSDs rather throw out of the window the free space received wisdom that applies to Winchester discs.

With SSDs, the amount of free space on the device as a whole, i.e., across all volumes on the disc and in between them, has an effect both upon performance and upon lifetime. And the very basis for the idea that a file needs to be stored in blocks with contiguous logical block addresses is undercut by the fact that SSDs do not have platters to rotate and heads to seek. The rules change again.

With SSDs, the recommended minimum amount of free space is actually more than the traditional 10% that comes from experiments with Winchester discs and Berkeley FFS 33 years ago. Anand Lal Shimpi gives 25%, for example. This difference is compounded by the fact that this has to be free space across the entire device, whereas the 10% figure is within each single FFS volume, and thus is affected by whether one's partitioning program knows to TRIM all of the space that is not allocated to a valid disc volume by the partition table.

It is also compounded by complexities such as TRIM-aware filesystem drivers that can TRIM free space within disc volumes, and the fact that SSD manufacturers themselves also already allocate varying degrees of reserved space that is not even visible outwith the device (i.e., to the host) for various uses such as garbage collection and wear levelling.

Bibliography

  • Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry (1984-08). A Fast File System for UNIX. ACM Transactions on Computer Systems. Volume 2 issue 3. pp.181–197. Archived at cornell.edu.
  • Ray Duncan (1989-09). Design goals and implementation of the new High Performance File System. Microsoft Systems Journal. Volume 4 issue 5. pp. 1–13. Archived at wisc.edu.
  • Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman (1996-04-30). "The Berkeley Fast Filesystem". The Design and Implementation of the 4.4 BSD Operating System. Addison-Wesley Professional. ISBN 0201549794.
  • Dan Bridges (1996-05). Inside the High Performance File System — Part 4: Fragmentation, Diskspace Bitmaps and Code Pages. Significant Bits. Archived at Electronic Developer Magazine for OS/2.
  • Keith A. Smith and Margo Seltzer (1996). A Comparison of FFS Disk Allocation Policies. Proceedings of the USENIX Annual Technical Conference. Archived at harvard.edu.
  • Steve D. Pate (2003). "Performance analysis of the FFS". UNIX Filesystems: Evolution, Design, and Implementation. John Wiley amp; Sons. ISBN 9780471456759.
  • Amir H. Majidimehr (1996). Optimizing UNIX for Performance. Prentice Hall. ISBN 9780131115514.
  • Bill Calkins (2002). "Managing File Systems". Inside Solaris 9. Que Publishing. ISBN 9780735711013.
  • Anand Lal Shimpi (2012-10-04). Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs. AnandTech.
  • Henry Cook, Jonathan Ellithorpe, Laura Keys, and Andrew Waterman (2010). IotaFS: Exploring File System Optimizations for SSDs. IEEE Transactions on Consumer Electronics. Archived at stanford.edu.
  • https://superuser.com/a/1081730/38062
  • Accela Zhao (2017-04-10). A Summary on SSD & FTL. github.io.
  • Does Windows trim unpartitioned (unformatted) space on an SSD?