When should I use /dev/shm/ and when should I use /tmp/?

/dev/shm is a temporary file storage filesystem, i.e., tmpfs, that uses RAM for the backing store.  It can function as a shared memory implementation that facilitates IPC.

From Wikipedia:

Recent 2.6 Linux kernel builds have started to offer /dev/shm as shared memory in the form of a ramdisk, more specifically as a world-writable directory that is stored in memory with a defined limit in /etc/default/tmpfs.  /dev/shm support is completely optional within the kernel config file.  It is included by default in both Fedora and Ubuntu distributions, where it is most extensively used by the Pulseaudio application.             (Emphasis added.)

/tmp is the location for temporary files as defined in the Filesystem Hierarchy Standard, which is followed by almost all Unix and Linux distributions.

Since RAM is significantly faster than disk storage, you can use /dev/shm instead of /tmp for the performance boost, if your process is I/O intensive and extensively uses temporary files.

To answer your questions: No, you cannot always rely on /dev/shm being present, certainly not on machines strapped for memory. You should use /tmp unless you have a very good reason for using /dev/shm.

Remember that /tmp can be part of the / filesystem instead of a separate mount, and hence can grow as required. The size of /dev/shm is limited by excess RAM on the system, and hence you're more likely to run out of space on this filesystem.


In descending order of tmpfs likelyhood:

┌───────────┬──────────────┬────────────────┐
│ /dev/shm  │ always tmpfs │ Linux specific │
├───────────┼──────────────┼────────────────┤
│ /tmp      │ can be tmpfs │ FHS 1.0        │
├───────────┼──────────────┼────────────────┤
│ /var/tmp  │ never tmpfs  │ FHS 1.0        │
└───────────┴──────────────┴────────────────┘

Since you are asking about a Linux specific tmpfs mountpoint versus a portably defined directory that may be tmpfs (depending on your sysadmin and what's default for your distro), your question has two aspects, which other answers have emphasized differently:

  1. Appropriate use of various tmp directories
  2. Appropriate use of tmpfs

Appropriate use of various tmp directories

Based on the ancient Filesystem Hierarchy Standard and what Systemd says about the matter.

  • When in doubt, use /tmp.
  • Use /var/tmp for data that should persist across reboots.
  • Use /var/tmp for large data that may not easily fit in RAM (assuming that /var/tmp has more available space – usually a fair assumption).
  • Use /dev/shm only as a side-effect of calling shm_open(). The intended audience is bounded buffers that are endlessly overwritten. So this is for long lived files whose content is volatile and not terribly large.
  • Definitely don't use /dev/shm for executables (of any kind), as it's commonly mounted noexec.
  • If still in doubt, provide a way for the user to override. For the least amount of surprise, do like mktemp and honor the TMPDIR environment variable.

Where tmpfs excels

It is important to say that where tmpfs really excels, above all else, is at hiding a performance bug that is painfully significant on a spinning disk. So if fixing it is an option, this is of course the inappropriate use of tmpfs:

fsync is a no-op on tmpfs. This syscall tells the OS to flush its page cache associated with a file, all the way down to flushing the write cache of the relevant storage device, all while blocking the program that issued it from making any progress at all – a very crude write barrier. It is a necessary tool in the box only because storage protocols aren't made with transactions in mind. And the caching is there in the first place to make it possible for programs to perform millions of small writes to a file without noticing how slow it actually is to write to a storage device – all actual writing happens asynchronously, or until fsync is called, which is the only place where write performance is directly felt by the program.

So if you find yourself using tmpfs (or eatmydata) just to defeat fsync, then you (or some other developer in the chain) are doing something wrong. It means that the transactions toward the storage device are unnecessarily fine grained for your purpose – you are clearly willing to skip some savepoints for performance, as you have now gone to the extreme of sabotaging them all – seldom the best compromise. Also, it is here in transaction performance land where some of the greatest benefits of having an SSD are – any SSD worth its salt is going to perform out-of-this-world compared to what a spinning disk can possibly take (7200 rpm = 120 Hz, if notihing else is accessing it). Flash memory cards also vary widely on this metric (it is a tradeoff with sequential performance, and the SD card class rating only considers the latter). So beware, developers with blazing fast SSDs, not to force your users into this use case!

Wanna hear a ridiculous story? My first fsync lesson: I had a job that involved routinely "upgrading" a bunch of Sqlite databases (kept as testcases) to an ever-changing current format. The "upgrade" framework would run a bunch of scripts, making at least one transaction each, to upgrade one database. Of course, I upgraded my databases in parallel (8 in parallel, since I was blessed with a mighty 8 core CPU). But as I found out, there was no parallelization speedup whatsoever (rather a slight hit) because the process was entirely IO bound. Hilariously, wrapping the upgrade framework in a script that copied each database to /dev/shm, upgraded it there, and copied it back to disk was like 100 times faster (still with 8 in parallel). As a bonus, the PC was usable too, while upgrading databases.

Where tmpfs is appropriate

The appropriate use of tmpfs is to avoid unnecessary writing of volatile data. Effectively disabling writeback, like setting /proc/sys/vm/dirty_writeback_centisecs to infinity on a regular filesystem.

This has very little to do with performance, and failing this is a much smaller concern than abusing fsync: The writeback timeout determines how lazily the disk content is updated after the pagecache content, and the default of 5 seconds is a long time for a computer – an application can overwrite a file as frequently as it wants, in pagecache, but the content on disk is only updated about once every 5 seconds. Unless the application forces it through with fsync, that is. Think about how many times an application can output a small file in this time, and you see why fsyncing every single one would be a much bigger problem.

What tmpfs can not help you with

  • Read performance. If your data is hot (which it better be if you consider keeping it in tmpfs), you will hit the pagecache anyway. The difference is when not hitting the pagecache; if this is the case, go to "Where tmpfs sux", below.
  • Short lived files. These can live their entire lives in the pagecache (as dirty pages) before ever being written out. Unless you force it with fsync of course.

Where tmpfs sux

Keeping cold data. You might be tempted to think that serving files out of swap is just as efficient as a normal filesystem, but there are a couple of reasons why it isn't:

  • The simplest reason: There is nothing that contemporary storage devices (be it harddisk or flash based) loves more than reading fairly sequential files neatly organized by a proper filesystem. Swapping in 4KiB blocks is unlikely to improve on that.
  • The hidden cost: Swapping out. Tmpfs pages are dirty — they need to be written somewhere (to swap) to be evicted from pagecache, as opposed to file backed clean pages that can be dropped instantly. This is an extra write penalty on everything else that competes for memory – affects something else at a different time than the use of those tmpfs pages.

Okay, here's the reality.

Both tmpfs and a normal filesystem are a memory cache over disk.

The tmpfs uses memory and swapspace as it's backing store a filesystem uses a specific area of disk, neither is limited in the size the filesystem can be, it is quite possible to have a 200GB tmpfs on a machine with less than a GB of ram if you have enough swapspace.

The difference is in when data is written to the disk. For a tmpfs the data is written ONLY when memory gets too full or the data unlikely to be used soon. OTOH most normal Linux filesystems are designed to always have a more or less consistent set of data on the disk so if the user pulls the plug they don't lose everything.

Personally, I'm used to having operating systems that don't crash and UPS systems (eg: laptop batteries) so I think the ext2/3 filesystems are too paranoid with their 5-10 second checkpoint interval. The ext4 filesystem is better with a 10 minute checkpoint, except it treats user data as second class and doesn't protect it. (ext3 is the same but you don't notice it because of the 5 second checkpoint)

This frequent checkpointing means that unnecessary data is being continually written to disk, even for /tmp.

So the result is you need to create swap space as big as you need your /tmp to be (even if you have to create a swapfile) and use that space to mount a tmpfs of the required size onto /tmp.

NEVER use /dev/shm.

Unless, you're using it for very small (probably mmap'd) IPC files and you are sure that it exists (it's not a standard) and the machine has more than enough memory + swap available.

Tags:

Linux