Deleting billions of files from a directory while seeing the progress as well

You can use rm -v to have rm print one line per file deleted. This way you can see that rm is indeed working to delete files. But if you have billions of files then all you will see is that rm is still working. You will have no idea how many files are already deleted and how many are left.

The tool pv can help you with a progress estimation.

http://www.ivarch.com/programs/pv.shtml

Here is how you would invoke rm with pv with example output

$ rm -rv dirname | pv -l -s 1000 > logfile
562  0:00:07 [79,8 /s] [====================>                 ] 56% ETA 0:00:05

In this contrived example I told pv that there are 1000 files. The output from pv shows that 562 are already deleted, elapsed time is 7 seconds, and the estimation to complete is in 5 seconds.

Some explanation:

  • pv -l makes pv to count by newlines instead of bytes
  • pv -s number tells pv what the total is so that it can give you an estimation.
  • The redirect to logfile at the end is for clean output. Otherwise the status line from pv gets mixed up with the output from rm -v. Bonus: you will have a logfile of what was deleted. But beware the file will get huge. You can also redirect to /dev/null if you don't need a log.

To get the number of files you can use this command:

$ find dirname | wc -l

This also can take a long time if there are billions of files. You can use pv here as well to see how much it has counted

$ find dirname | pv -l | wc -l
278k 0:00:04 [56,8k/s] [     <=>                                              ]
278044

Here it says that it took 4 seconds to count 278k files. The exact count at the end (278044) is the output from wc -l.

If you don't want to wait for the counting then you can either guess the number of files or use pv without estimation:

$ rm -rv dirname | pv -l > logfile

Like this you will have no estimation to finish but at least you will see how many files are already deleted. Redirect to /dev/null if you don't need the logfile.


Nitpick:

  • do you really need sudo?
  • usually rm -r is enough to delete recursively. no need for rm -f.

Check out lesmana's answer, it's much better than mine — especially the last pv example, which won't take much longer than the original silent rm if you specify /dev/null instead of logfile.

Assuming your rm supports the option (it probably does since you're running Linux), you can run it in verbose mode with -v:

sudo rm -rfv bolands-mills-mhcptz

As has been pointed out by a number of commenters, this could be very slow because of the amount of output being generated and displayed by the terminal. You could instead redirect the output to a file:

sudo rm -rfv bolands-mills-mhcptz > rm-trace.txt

and watch the size of rm-trace.txt.


Another option is to watch the number of files on the filesystem decrease. In another terminal, run:

watch  df -ih   pathname

The used-inodes count will decrease as rm makes progress. (Unless the files mostly had multiple links, e.g. if the tree was created with cp -al). This tracks deletion progress in terms of number-of-files (and directories). df without -i will track in terms of space used.

You could also run iostat -x 4 to see I/O operations per second (as well as kiB/s, but that's not very relevant for pure metadata I/O).


If you get curious about what files rm is currently working on, you can attach an strace to it and watch as the unlink() (and getdents) system calls spew on your terminal. e.g. sudo strace -p $(pidof rm). You can ^c the strace to detach from rm without interrupting it.

I forget if rm -r changes directory into the tree it's deleting; if so you could look at /proc/<PID>/cwd. Its /proc/<PID>/fd might often have a directory fd open, so you could look at that to see what your rm process is currently looking at.