Applications of algebraic geometry to machine learning

One useful remark is that dimension reduction is a critical problem in data science for which there are a variety of useful approaches. It is important because a great many good machine learning algorithms have complexity which depends on the number of parameters used to describe the data (sometimes exponentially!), so reducing the dimension can turn an impractical algorithm into a practical one.

This has two implications for your question. First, if you invent a cool new algorithm then don't worry too much about the dimension of the data at the outset - practitioners already have a bag of tricks for dealing with it (e.g. Johnson-Lindenstrauss embeddings, principal component analysis, various sorts of regularization). Second, it seems to me that dimension reduction is itself an area where more sophisticated geometric techniques could be brought to bear - many of the existing algorithms already have a geometric flavor.

That said, there are a lot of barriers to entry for new machine learning algorithms at this point. One problem is that the marginal benefit of a great answer over a good answer is not very high (and sometimes even negative!) unless the data has been studied to death. The marginal gains are much higher for feature extraction, which is really more of a domain-specific problem than a math problem. Another problem is that many new algorithms which draw upon sophisticated mathematics wind up answering questions that nobody was really asking, often because they were developed by mathematicians who tend to focus more on techniques than applications. Topological data analysis is a typical example of this: it has generated a lot of excitement among mathematicians but most data scientists have never heard of it, and the reason is simply that higher order topological structures have not so far been found to be relevant to practical inference / classification problems.


Check websites of some of the former students/postdocs of Bernd Sturmfels: Pablo Parillo at MIT and Rekha Thomas at University of Washington. They tackle real (meaning relatively big) problems with their algebraic geometry insight. This is mostly related to semidefinite programming and related problems. You can check Pablo's MIT course to get some idea: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-972-algebraic-techniques-and-semidefinite-optimization-spring-2006/

There is also a huge line of research on computational algebraic geometry. You can start visiting the website of Jon Hauenstein http://www3.nd.edu/~jhauenst/ They work with big industrial projects.

Algebraic geometry is also used in the topological data analysis which is now becoming a big thing in the data science.


I'm going to assume that by 'high dimensional problems' you specifically mean learning speech recognition and image recognition. As such, I'm afraid that the answer seems to be 'sadly no'. Real high dimensional problems are almost always best solved by the use of what is called neural nets and specifically 'deep learning' which amounts to creating neural nets with many layers that will effectively predict. I'm not personally aware of any approach that comes close to DL for these kinds of problems. I might add that I teach data mining, so my impression is somewhat stronger than just a casual impression. There are two main issues with high dimensional data and neural nets. The first is called feature selection. Basically, the variables one is given are likely not the variables one wants. In addition there is the question of how the nodes in the various layers should be connected. The basic flow chart of a neural net can be encompassed in a directed acyclic graph i.e. a DAG. One could fantasize that somehow choosing the correct DAG would amount to an algebraic geometry problem- along the lines of the uses of algebraic geometry in phylogenetics. But that is just a fantasy as far as I know.