Replacing the contents of a file on the UNIX command line using the standard commands is surprisingly tricky. In the examples below, $filter is any command that reads from stdin and writes to stdout.
  1. $filter < file > file
    
    • Newbie fail. Immediately lose all your data as the shell truncates "file" before the filter reads the data.
  2. cat file | $filter > file
    
    • Novice fail. `cat` may be run quickly enough to read the data before the shell is scheduled to truncate the file. But depending on the kernel, the size of the file and the load on the system, you will lose data.
  3. cp file file.tmp
    $filter < file.tmp > file
    rm file.tmp
    
    • The current directory might not be writeable
    • Lose your data if $filter fails or is interrupted
    • Data inconsistent for a while
    • Slower as file written twice
  4. { rm file && $filter > file; } < file
    
    • The current directory might not be writeable
    • Lose your data if $filter fails or is interrupted
    • Data inconsistent for a while
    • Lose all attributes of original file
  5. If we don't care about atomicity then we can use another writeable dir.
    $filter < file > /tmp/file.tmp &&
    mv /tmp/file.tmp file
    
    • Lose all attributes of original file
    Also if /tmp is a different file system, then:
    • Slower as file written twice
    • mv still needs writeable dir as it recreates the file
    • File missing for a while
    • Data inconsistent for a while
    • Limited to attributes supported by the other file system
  6. We can get around most of the previous issues by using cp and rm rather than mv as cp does a truncate(); write(); on the original file, so all attributes are maintained. Note this method is functionally equivalent to that used by the sponge utility.
    $filter < file > /tmp/file.tmp &&
    cp /tmp/file.tmp file
    rm /tmp/file.tmp
    
    • Slower as file written twice
    • Data inconsistent for a while
  7. If we do need atomicity then we need to have a dir on the same filesystem that's writeable, so that we can do a rename(old,new) which is atomic. Usually that's the current dir so assuming that...
    $filter < file > file.tmp &&
    mv file.tmp file
    
    • Lose all attributes of original file
  8. If we want to maintain all attributes which is increasingly important with selinux and capabilities etc. we'd have to:
    cp -a file file.tmp
    $filter < file > file.tmp &&
    mv file.tmp file
    
    • Slower as file written twice
  9. If we want to be more efficient, then we would need cp to support only copying the attributes. I've proposed a patch to do just that:
    cp --attributes-only file file.tmp
    $filter < file > file.tmp &&
    mv file.tmp file
    
    • Note certain attributes are only maintainable by root. For e.g. non root users updating another user's file by first creating a new temporary file, will silently change ownership of the original file when it's replaced.
  10. What if you want a backup though?
    cp --attributes-only file file.tmp
    $filter < file > file.tmp &&
    mv -b file.tmp file
    
    • This is no longer atomic as the file is not present for a short while as mv implements the backup like: rename(old,bak); rename(tmp,old);
  11. So therefore if you want to support atomic replacement with backups with cp/mv you need to
    cp --attributes-only file file.tmp
    cp -a -b -f file file
    $filter < file > file.tmp &&
    mv file.tmp file
    
    • Slower as file written twice
  12. We can make the extra backup step in the previous step more efficient by using hardlinks (thanks reddit!).
    cp --attributes-only file file.tmp
    cp -l -b -f file file
    $filter < file > file.tmp &&
    mv file.tmp file
    
So if one was to implement a general replacement script (as I'm currently considering for GNU coreutils), one could apply the following logic:
   if --atomic && --backup; then
       12
   elif --atomic; then
       9 || 8
   else
       9 || 8 || 6
   fi
Note there are many other edge cases to consider which are mainly handled within cp/mv, which one can see by looking at the complexity of copy.c.

There is also the general caveat of how to deal with interruptions at various parts of the above. I.E. if a script implementing the above is killed, are there tmp files left on disk?

© Mar 24 2010