What is the principle of "code moving to data" rather than data to code?

The basic idea is easy: if code and data are on different machines, one of them must be moved to the other machine before the code can be executed on the data. If the code is smaller than the data, better to send the code to the machine holding the data than the other way around, if all the machines are equally fast and code-compatible. [Arguably you can send the source and JIT compile as needed].

In the world of Big Data, the code is almost always smaller than the data.

On many supercomputers, the data is partitioned across many nodes, and all the code for the entire application is replicated on all nodes, precisely because the entire application is small compared to even the locally stored data. Then any node can run the part of the program that applies to the data it holds. No need to send the code on demand.