Recover txt files based on known strings

I don't know of any file recovery tool that selects files based on a specific string they contain, but these three methods should work:

  1. When a file on a FAT32 partition is erased, its filename doesn't get overwritten. Only the first byte of the 8.3 filename gets set to E5, marking the file as deleted. This won't affect the extension, so TXT files are still easily recognizable.

    You can use any file recovery tool that lets you specify an extension (e.g., Recuva), recover all TXT files and then search for diary in all recovered files.

    Since text files are (usually) small, recovering the text files shouldn't take much time (probably less than finding them). For a 150 GB partition, this should be rather quick.

  2. Programs like PhotoRec identify files by their content and attempt to recover them. While it is true that text files don't have any headers, PhotoRec still manages to identify them (by exclusion, I suppose).

    Again, you could recover all text files and then search for diary in all recovered files.

    Identifying text files by their content will take longer than by their extension, but it will find files which directory entry has been overwritten as well.

  3. Since you don't expect the text files to be big, you could also search for diary in the partition dump and recover the cluster containing it:

    sudo bash -c '
        for OFFSET in $(grep -abio diary /dev/sda3 | cut -d: -f 1); do
            ((CLUSTER = OFFSET / 4096))
            dd if=<imgfile> of=cluster$CLUSTER.txt bs=4096 skip=$CLUSTER count=1
        done
    '
    

    How it works:

    • grep -Pabio diary /dev/sda3 | cut -d: -f 1 will print the byte offsets of every occurrence of the string diary in the image file.

      The -i switch makes the search case-insensitive. The -P switch turns on Perl-compatible Regular Expressions. This is needed because of a bug in some versions of (GNU) grep that makes case-insensitive searches unbearably slow unless you use PCRE.

    • ((CLUSTER = OFFSET / 4096)) calculates the offset in clusters from the offset in bytes.

    • dd if=<imgfile> of=cluster$CLUSTER.txt bs=4096 skip=$CLUSTER count=1 writes the cluster at offset X in a file named clusterX.txt.

    By its nature, this will work only for files that fit in one cluster. You can increase count to recover more than one cluster and decrease CLUSTER to recover previous clusters as well.

    To recover three clusters (one before and one after the cluster containing diary), make the following changes:

    ((CLUSTER = OFFSET / 4096 - 1))
    
    dd ... count=3