When a PC edits a file, does it delete the original file?

Could be either – it depends on the text editor that was used.

The concept of a 'text file' isn't built into computers – each operating system may manage files differently, and each text editor may use those files differently.

In practice, you'll find text editors which have both mechanisms. Practically all operating systems allow direct overwrite of an existing file's contents, so simple editors such as Notepad usually just ask the OS to write directly into the original file, as that's easiest to implement – but risky if you lose power mid-write. So for reliability reasons, many editors deliberately save the updated data to a new file and delete the original.

(I think in-place updates are more common among hex editors, where most edits don't insert/delete bytes but only change existing locations, so a full rewrite file is not needed.)

There's even a third mode of operation – the editor might first make a backup copy of the old file, then directly write new data into the file.


It also depends on the filesystem which keeps the file. With most traditional filesystems, if a program asks to write to an existing file, the filesystem will just overwrite old data in-place.

However, some filesystems do work in "copy-on-write" mode, where any new data is always written to a different location, whether the program wants it or not. Again, this has the possible advantage of increased reliability because an interrupted change can be fully reverted.

In some filesystems (such as Btrfs or ext4) this is an optional feature; in others (e.g. log-structured filesystems) it is part of the core design.


Since you are talking about "saving the file", then file will not be edited in-place on disk.

With a file in a usual filesystem, there are two things to consider. There is the directory entry, and then there is the actual file data somewhere on the disk.

When you edit a file in a normal editor, it will load the file data into RAM, and any editing will just happen on that copy of the data. Then when you save the file, there are basically two options:

Option 1: the original file is renamed, so both the original directory entry and the original data will remain on the disk. The rename might for example change file suffix to .bak (removing any previous .bak file, usually). Then a new file is created and the data from memory is written there.

Option 2: the original directory entry is modified so the file is truncated to 0 length. The area on disk used for file data will be marked as unused, but the old file contents will remain on disk until they are overwritten. Then new data is written. In this case the directory entry remains, just the data it points to is changed.

There are a few possible variations, a common one being, the edited data is first stored to temporary file, so if your computer crashes at this point, the original file will likely not be damaged. Then the original file is deleted and the new file renamed with the correct name. Or, the original file could just be deleted before writing the new one.

So your theory 1 is close to what most editors do.


Then there are special cases. The most obvious one is a disk editor, which allows reading and overwriting bytes directly on disk. Another might be a database file, where records might be fixed size, so it's easy to just overwrite a record. But data can't be appended in the middle of a file, and therefore editing text files or any other files where the length of the data in the middle of the file commonly changes, these tricks can't really be used.

So your theory 2 is possible in some cases, but normal text editors and such don't do it.


Historically, drives were directly controlled by the OS, which in turn controlled by the application. In that context, Theory 2 was the primary way PCs worked. the OS specified a physical location to put data, and it had full control over this process. As a result, early file systems had a "bad sector" table, so after your data was lost, the computer could tell you the data was lost and mark the sector as unusable to avoid more data loss. Disk scans and defragmentation was the order of the day.

However, after the turn of the century, we moved to LBA, so now the OS would simply reference the "logical" block it wanted to read or write to. The hard drive itself now had the intelligence to shuffle around data behind the OS's back without it noticing. This meant better reliability, since sectors that failed to verify could simply be moved to a new physical location without affecting the OS's knowledge of where that data was located.

In modern hardware, the "platter" disk drives typically just overwrite whatever was there before with the new incoming data, and optionally remaps the LBA if the sector looks like it might not retain the data (the sector is damaged or worn). "Flash" drives typically erase the old cells and then write data to new cells, a process known as wear-leveling.

In both cases, this is possible because there is always unused capacity beyond the reported value. This overprovisioning allows the drive to have a longer usable life than the rather unreliable technology of the previous century's technology. The LBA mode enables the physical medium to be abstracted from the OS so that the drive itself could take whatever measures the drive thinks is necessary to prevent data loss.

At the application level, you typically open a file in "WRITE" mode, which tells the OS to clear the file ("delete" the contents, but not the file itself), then write new data. All of this is buffered at the OS level, then "flushed" to the drive, which makes the requested changes.

Given that information, Theory 1 is what technically happens at the application programming level, at least by default, as there is also a "write with append" mode to avoid clearing the file contents. The OS itself will present the changes to be made more like Theory 2, but abstracted via LBA. The drive itself will then probably do something that's a mix of Theory 1 and Theory 2.

Yep. It's complicated, and very part-manufacturer/OS-developer/application-developer dependent. However, all of this complexity is aimed at making data storage more reliable while improving power usage/battery life.

Tags:

Editing