Chemistry - Why can't equations of state be replaced by machine learning models?

Solution 1:

For n data points there exists an n-1 order polynomial that perfectly fits the data.

Therefore there is no basis for "a neural network or whatever" being better.

Furthermore, it simply isn't true that the Peng-Robinson equation "has no underlying physical meaning".

The Peng-Robinson equation (like Van der Waals) recognizes that atoms/molecules occupy space and that attractive forces exist between them.

As Peng and Robinson say in their article:

Since two-constant equations have their inherent limitations, and the equation obtained in this study is no exception, the justification for the new equation is the compromise of its simplicity and accuracy.

That being said, neural networks have been used to find equations of states. See for example Equation of state and artificial neural network to predict the thermodynamic properties of pure and mixture of liquid alkali metals

Generally, ANN is powerful and successful method for complex non-linear systems due to unique advantages such as high speed, simplicity and large capacity which reduce engineering attempt. In recent years, ANN modeling has been successfully used for prediction of thermophysical properties of pure and mixture fluids [24], [25], [26] and [27].

Solution 2:

It is true that a high order polynomial can fit any training set. But that is not a strength - an unfalsifiable model overfits. In particular, a polynomial of order n is only likely to be predictive if the true function is n times differentiable. Since chemical space is discrete, and for many purposes some molecules are special cases, polynomial models are a poor choice in cheminformatics.

Neural nets can work, with an explicit regularizer or with dropout. It's true that the resulting model is a black box - after building it, there is the challenge of understanding the predictions. However, running the model to make predictions is cheaper than doing a lot of experiments. After gaining some understanding of the model you can do more targeted experiments to check it against reality.

With some problems, we have had good results with a fingerprint-based SVM.