Delete all but 1000 random files in a directory

Delete all but 1000 random files in a directory

Code:

find /path/to/dir -type f -print0 | sort -zR | tail -zn +1001 | xargs -0 rm

Explanation:

  1. List all files in /path/to/dir with find;
    • print0: use \0 (null character) as the line delimiter; so file paths containing spaces/newlines don't break the script
  2. Shuffle the file list with sort;
    • -z: use \0 (null character) as delimiter, instead of \n (a newline)
    • -R: random order
  3. Strip first 1000 lines from the randomized list with tail;
    • -z: treat the list as zero-delimited (same as with sort)
    • -n +1001: show lines starting from 1001 (ie. omit first 1000 lines)
  4. xargs -0 rm - remove the remaining files;
    • -0: zero-delimited, again

Why it's better than quixotic's solution*:

  1. Works with filenames containing spaces/newlines.
  2. Doesn't try to create any directories (which may already exist, btw.)
  3. Doesn't move any files, doesn't even touch the 1000 "lucky files" besides listing them with find.
  4. Avoids missing a file in case the output of find doesn't end with \n (newline) for some reason.

* - credit to quixotic for | sort -R | head -1000, gave me a starting point.

Tags:

Linux