Why does closing a file wait for sync when overwriting a file, but not when creating?

That sounds like a reminder of the O_PONIES fiasco, which just recently had its 11th birthday.

Before ext4 came, ext3 had acquired a sort of a reputation for being stable in the face of power losses. It seldom broke, it seldom lost data from files. Then, ext4 added delayed allocation of data blocks, meaning that it didn't even try to write file data to disk immediately. Normally, that's not a problem as long as the data gets there at some point, and for temporary files, it might turn out that there was no need to write the data to disk at all.

But ext4 did write metadata changes, and recorded that something had changed with the file. Now, if the system crashed, the file was marked as truncated, but the writes after that weren't stored on disk (because no blocks were allocated for them). Hence, on ext4, you'd often see recently-modified files truncated to a zero length after a crash.

That, of course was not exactly what most users wanted, but the argument was made that application programs that cared about their data so much, should have called fsync(), and if they actually cared about renames, they should fsync() (or at least fdatasync()) the containing directory too. Next to no-one did that, though, partly because on ext3, an fsync() synced the whole disk, possibly including large amounts of unrelated data. (Or as close to the whole disk that the difference doesn't matter anyway.)

Now, on one hand, you had ext3 which performed poorly with fsync() and on the other, ext4 that required fsync() to not lose files. Not a nice situation, considering that most application programs would care to implement filesystem-specific behavior even less than the rigid dance with calling fsync() at just the right moments. Apparently it wasn't even easy to figure out if a filesystem was mounted as ext3 or ext4 in the first place.

In the end, the ext4 developers made some changes to the most common critical-seeming cases

  • Renaming a file on top of another. On a running system, this is an atomic update and is commonly used to put a new version of a file in place.
  • Overwriting an existing file (your case). This isn't atomic on a running system, but usually means the application wants the file replaced, not truncated. If an overwrite is botched, you'd lose the old version of the file too, so this is a bit different from creating a completely new file where a power-out would only lose the most recent data.

As far as I can remember, XFS also exhibited similar zero-length files after a crash even before ext4. I never followed that, though, so I don't know what sorts of fixes they'd have done.

See, e.g. this article on LWN, which mentions the fixes: ext4 and data loss (March 2009)

There were other writings about that at the time, of course, but I'm not sure it's useful to link to them, as it's mostly a question of pointing fingers.

Tags:

Ext4

Files

Cache