Roadmap for learning Topological Data Analysis?

Before answering you question I would like to discuss some points:

  1. Topological data analysis is roughly, as you write, (algebraic) topology applied to the study of data. While you certainly will need to learn some topology, the type of topology that you should learn really depends on the type of applications you are interested in. For this reason I will not give you a roadmap, but a suggestion on how to draw your own roadmap.
  2. You should also not forget the second part in the definition of topological data analysis, namely that you are studying data. For this it would be good to learn some general facts about data analysis, and in particular statistics (more about this below). For a statistician’s viewpoint on topological data analysis, there is a nice series of columns by Robert Adler on what he calls TOPOS, available here.
  3. You have to know your data. This might go without saying, but too often I have seen people throwing some method at data to see what comes out of it, without even asking themselves why they are using that specific method. While depending on your job conditions you might be given more or less time to work on a specific project, I think that you should really try to make sure that you understand the data and the context as best as possible before even starting to think about which method you want to use. While topology gives a wealth of different methods that can be applied to the study of data, these might not always be the best tools to use, and there might be other techniques which are better suited. The bottom line is: there is no method or set of methods that fits all problems.

And here comes my suggestion for how to draw your own roadmap:

  • Topology. Robert Ghrist’s book Elementary Applied Topology gives a succinct overview of the main methods and ideas from topology that are used in applications. Every chapter covers a certain topic in topology and then gives examples of applications of these. While there are other texts on applied topology that delve into more detail from the mathematical point of view, I would suggest to use Ghrist’s book to get an idea of the applications and set of ideas, and then draw your own roadmap of topics that you would like to cover from there. Since the text is succint, you might need to use also other texts to learn more about the mathematics covered in each chapter. For example, to learn more about (smooth) manifolds (Chapter 1) you might want to read up some more things in Lee’s Introduction to smooth manifolds, or to learn more about Cohomology (Chapter 6) you might want to consult Hatcher’s Algebraic Topology. Again, I don’t think that there is a ''one size fits all'' answer to which texts you should use for this, but once you have a good grasp of what exactly you would like to understand better, you could again ask people with more experience for advice.
  • Statistics. A book that analogously to Ghrist’s book could help you in designing your own roadmap is Larry Wasserman’s All of Statistics. Also, note that the application of statistical methods to techniques from topological data analysis is an active area of research, and while there are some tools and libraries that can be used for applications, this area is still in its infancy. I list here the libraries and relevant references for statistical tools for topological data analysis that I know off the top of my head (these are all related to persistent homology):

    • Persistence Landscapes and the corresponding toolbox
    • The TDA package tutorial and the package
    • Persistence images and library
  • Data science. Finally, as for data science more broadly, I don’t know any good text, but you might get an idea of some of the general themes from the book Mathematical Problems in Data Science.


Aside: to finish off, I give some additional references to books/papers and software packages.

  1. References for topological data analysis, and computational topology:

    • Topology and data, Carlsson

    • Computational Topology, Edelsbrunner and Harer

    • Topology for Computing, Zomorodian
    • Persistence Theory, Oudot (this might be too specific, but this would be useful if you want to learn more about the theory behind persistent homology)
    • Computational homology, Kaczynski, Mischaikow, Mrozek
  2. Open source libraries that implement some of the methods from topological data analysis:

    • Mapper: Python Mapper
    • Persistent homology: a few of the most recent (and best performing) libraries are Ripser , GUDHI, and DIPHA. Note that there is also an overview of the different libraries for persistent homology available here. (Disclaimer: I am one of the authors of this paper. Also, the version on the ArXiv is outdated, and will be replaced by an up-to-date version in the next weeks, so it might be better to look at this once it is updated.)

Geometric and Topological Inference is an excellent book for introducing persistent homology. If you didn't do algebraic topology course it should be easier than Edelsbrunner and Harer's book. I also found it more approachable since it has more exercises, and gives more details on construction of complexes.