Why does moving some files in a folder take longer than moving the whole folder?

TL;DR: No

For a smaller amount of files, you would not need find but, even in this simplified and smaller case, if you just

mv *.jpg ../../dst/

it will take more time than moving the whole directory at once.


Why? The point is to understand what mv does.

Briefly speaking, mv moves a number (that identifies a directory, or a file) from an inode (the directory containing it) to another one, and these indices are updated in the journal of the file system or in the FAT (if the file system is implemented in such a way).

If source and destination are on the same file system, there is no actual movement of data, it just changes the position, the point where they are attached to.

So, when you mv one directory, you are doing this operation one time.

But when you move 1 million files, you are doing this operation 1 million times.

To give you a practical example, you have a tree with a many branches. In particular, there is one node to which 1 million branches are attached.
To cut down these branches and move them somewhere else, you can either cut each one of them, so you make 1 million cuts, or you cut just before the node, thus making just one cut (this is the difference between moving the files and the directory).


It will still be slow because, as noted, the file system has to relink each file name to its new location.

However, you can speed it up from what you have now.

Your find command runs the exec once for each file. So it launches the mv command 12 million times for 12 million files. This can be improved in two ways.

  • Add a plus to the end:
    find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ +
    Check the man-page to make sure it's supported in your version of find. The effect should be to run a series of mv commands with as many filenames as will fit on each command-line.

  • Use find and xargs together.
    find -maxdepth 1 -name '*.jpg' -print0 | xargs -0 mv -t ../../dst/
    The -print0 will use NUL, aka zero bytes to separate the file names. This plus xargs -0 fixes any problems xargs would otherwise have with spaces in file names. The xargs command will read the list of file names from the find command and run the mv command on as many file names as will fit.


Your confusion comes from the file system abstraction which makes you believe that a folder contains files and other folders in a tree-like fashion. This is not actually true: all files and directories within a file system are located on the same level and identified with numbers of some sort, dependent on implementation. Directories are just special files which contain lists of other files.

When you "move" files inside a file system, actual files don't go anywhere. Rather, lists inside directories are updated to reflect the change.

mv src ../dst moves a single list entry from directory . to directory ../dst, so it's fast.

find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ has to move millions of entries, so it's slower. It may potentially be speeded up if you call mv only once and not once per file, and the mv command itself may be optimized to move several directory entries in one step, but there is no way to make it as fast as when you move a single directory.

Tags:

Mv