I deleted a file by mistake a while ago for the first time in years. It was a source code file that I was after working on for a couple of hours, and so was not available in my daily backup. I tried using the undelete feature of `mc` which previously worked for me when ext2 was the default linux filesystem, but now with use of ext3 which is a journalling file system, it didn't work at all. Then I tried the ext3grep tool which I had coincidentally noticed a few days previously. It churned away for a while on my years old 38GB file system and crashed after about 20 minutes or so. Update June 2011: I noticed the extundelete utility which may do better?

Now since I was after working on the source for a couple of hours I was able to remember strings which were specific to that file. Therefore I wrote a quick script to grep successive chunks of the disk, to find any that contained those strings. It found a few actually, I suppose due to vim swap files etc. I then used dd to write a few of the chunks to another filesystem, and opened the chunk directly with vim. Then it was trivial to search for and copy & paste my source text to a new file. What a relief!

Note the script found the pertinent disk chunks quickly, but didn't seem to terminate. The script was quite simple and essentially did:
DISK=/dev/sda8
CHUNK_SIZE=$((8*1024*1024))

i=0
while true; do
    { dd if=$DISK bs=$CHUNK_SIZE count=1 skip=$i 2>/dev/null || break; } |
    grep -qF "$STRING" && echo "chunk $i"
    i=$(($i+1))
done
This had 2 problems actually. The first was that if you ask dd to skip beyond the size of the disk or partition it doesn't give an error. Even more surprisingly it will read the whole disk or partition in this case! This is a bug IMHO and I submitted a patch to stop the erroneous reading at least. To work around this issue in the script, I first determine the size of the passed device or file, and get dd to read only the required number of chunks. Doing this also has the advantage of allowing us to show a progress meter for this potentially long operation.

The other gotcha is if grep matches the string and exists, without having read the whole chunk from dd, it will cause dd to get a SIGPIPE error when it tries to write the rest of its data to the now exited grep. Therefore since we want to exit the loop on any dd errors we need to get grep to read all possible input.

So I incorprated these enhancements into a disk_grep script that hopefully others will find useful.

© Jun 10 2008