File Replacement on UNIX

Replacing the contents of a file on the UNIX command line using the standard commands is surprisingly tricky. In the examples below, $filter is any command that reads from stdin and writes to stdout.

```
$filter < file > file
```
- Newbie fail. Immediately lose all your data as the shell truncates "file" before the filter reads the data.
```
cat file | $filter > file
```
- Novice fail. `cat` may be run quickly enough to read the data before the shell is scheduled to truncate the file. But depending on the kernel, the size of the file and the load on the system, you will lose data.
```
cp file file.tmp
$filter < file.tmp > file
rm file.tmp
```
- The current directory might not be writeable
- Lose your data if $filter fails or is interrupted
- Data inconsistent for a while
- Slower as file written twice
```
{ rm file && $filter > file; } < file
```
- The current directory might not be writeable
- Lose your data if $filter fails or is interrupted
- Data inconsistent for a while
- Lose all attributes of original file
If we don't care about atomicity then we can use another writeable dir.
```
$filter < file > /tmp/file.tmp &&
mv /tmp/file.tmp file
```
- Lose all attributes of original file
Also if /tmp is a different file system, then:
- Slower as file written twice
- mv still needs writeable dir as it recreates the file
- File missing for a while
- Data inconsistent for a while
- Limited to attributes supported by the other file system
We can get around most of the previous issues by using cp and rm rather than mv as cp does a truncate(); write(); on the original file, so all attributes are maintained. Note this method is functionally equivalent to that used by the sponge utility.
```
$filter < file > /tmp/file.tmp &&
cp /tmp/file.tmp file
rm /tmp/file.tmp
```
- Slower as file written twice
- Data inconsistent for a while
If we do need atomicity then we need to have a dir on the same filesystem that's writeable, so that we can do a rename(old,new) which is atomic. Usually that's the current dir so assuming that...
```
$filter < file > file.tmp &&
mv file.tmp file
```
- Lose all attributes of original file
If we want to maintain all attributes which is increasingly important with selinux and capabilities etc. we'd have to:
```
cp -a file file.tmp
$filter < file > file.tmp &&
mv file.tmp file
```
- Slower as file written twice
If we want to be more efficient, then we would need cp to support only copying the attributes. I've proposed a patch to do just that:
```
cp --attributes-only file file.tmp
$filter < file > file.tmp &&
mv file.tmp file
```
- Note certain attributes are only maintainable by root. For e.g. non root users updating another user's file by first creating a new temporary file, will silently change ownership of the original file when it's replaced.
What if you want a backup though?
```
cp --attributes-only file file.tmp
$filter < file > file.tmp &&
mv -b file.tmp file
```
- This is no longer atomic as the file is not present for a short while as mv implements the backup like: rename(old,bak); rename(tmp,old);
So therefore if you want to support atomic replacement with backups with cp/mv you need to
```
cp --attributes-only file file.tmp
cp -a -b -f file file
$filter < file > file.tmp &&
mv file.tmp file
```
- Slower as file written twice

We can make the extra backup step in the previous step more efficient by using hardlinks (thanks reddit!).

cp --attributes-only file file.tmp
cp -l -b -f file file
$filter < file > file.tmp &&
mv file.tmp file

So if one was to implement a general replacement script (as I'm currently considering for GNU coreutils), one could apply the following logic:

   if --atomic && --backup; then
       12
   elif --atomic; then
       9 || 8
   else
       9 || 8 || 6
   fi

Note there are many other edge cases to consider which are mainly handled within cp/mv, which one can see by looking at the complexity of copy.c.

There is also the general caveat of how to deal with interruptions at various parts of the above. I.E. if a script implementing the above is killed, are there tmp files left on disk?

There is also the more general caveat of the variances in consistency guarantees provided by various file systems.

[Update June 2015:

More abstractly what we're discussing here is how to apply ACID principles when updating files. ACID being an acronym for: Atomicity, Consistency, Isolation, and Durability. There are interesting discussions on ACID implementation at the file system level, in the "amino" and "TxFS" file systems, though we can get ACID semantics for single files, using the traditional primitives provided by journalling file systems.

The methods described above only consider "AC" semantics, and would need to be augmented with flock(1) or sync(1) etc. for "ID" semantics. Given the portability issues and trickiness involved in the above methods, it would make sense to abstract all these implementation details away in a separate `replace` or similarly named utility.

As a concrete example, the crudini utility (written in python), was improved in stages to apply ACID file update principles, in the following commits:

]