Chemistry - Practical differences between storing 2-electron integrals and calculating them as needed?

Solution 1:

It depends on system size. There is a crossover point where it becomes faster to calculate the integrals as needed because as a system becomes larger and larger, the number of disk operations required to store the integrals becomes very large which greatly slows down the calculation. This is because disk operations are relatively slow compared to memory manipulation.

Thus, most software packages have an option to calculate the integrals directly and not store them. In Molpro I believe this command is gdirect.

To be more specific, there are two aspects of the storage which are slow when there are a large number of integrals. One is that disk operations themselves are slower than just doing the calculation which is only memory manipulation. The other is that the integrals are stored in some fancy way so as to minimize the amount of time it will take to find the specific integral you are looking for, but when there are a very large number of integrals, some non-negligible amount of time will go into finding the value of the integral. When these two components (plus maybe others I am missing) are larger than the time required to just do the integral calculation, then it is faster to compute the integrals directly.

I'll let someone else be more quantitative with the timing of disk operations and the lookup tables as these are more of computer science questions. But quite interesting ones.

For a useful reference on how many integrals we are talking about storing on disk, see this question.

Also as a practical guide, in my experience this crossover point was around 14 hevay atoms with a triple-zeta basis set, but this means almost nothing because this will be heavily dependent on the computing architecture you are using. I would expect the crossover point would be much higher for most supercomputing systems as I did not have much disk to work with comparatively.

Solution 2:

Several factors go into deciding:

  • calculation type: for most post-SCF calculation, disk-based storage is indispensable
  • storage speed (solid-state disk?) vs. calculation speed (GPU?)
  • predictability of which integrals are needed. Screening by density matrix value makes some integrals irrelevant for SCF, but for post-SCF, this is different.
  • core memory vs. amount of disk memory allotted

In fact, many programs actually do not decide, but do both. ORCA, for instance, allows for a hybrid approach, trying to store integrals on disk in the order that they are eventually needed. ORCA and Turbomole have algorithms to decide on a category-basis, essentially going by the total angular momentum because the recursive calculation makes high-angular-momentum integrals expensive. Some integrals may only be needed every few iterations, when doing a full recalculation of the Fock matrix instead of $\mathbf{F}_{n+1} = \mathbf{F}_{n} + \Delta \mathbf{F}$, others every iteration, which may lead to amending the storage.

Note that calculating all integrals and storing them is sometimes called the "conventional" way, whereas calculation on-the-fly is often called "direct".