Proofs shown to be wrong after formalization with proof assistant

First of all, to explain my perspective: I'm a PhD student working in the formalisation of mathematics with Isabelle/HOL, and I've been working with that system for about 7 years. I was introduced to it in a lecture when I was an undergraduate and I got hooked immediately. I do think it is useful, but I don't do it because of that. I do it because it's fun.

Anyway, your question is a bit tricky to answer because it depends a lot on what you mean by a ‘wrong proof’ and by ‘shown wrong by formalizing them’. A lot of the time, it's something of a grey area.

Normally, one needs a very thorough understanding of the paper proof in order to formalise it and one has to think of a way of formalising the argument. Conceptual problems with the paper proof often become apparent at this stage already, when there is no theorem prover involved as such yet.

Secondly, of course, if you formalise something like the Prime Number Theorem or Cauchy's Integral Theorem, you're probably not going to find out that it's all wrong and everything collapses. But you might well find problems in particular proofs in textbooks.

I do find a lot of ‘mistakes’ in proofs, including textbook proofs and published papers. Most of these mistakes are easily fixed and most mathematicians would likely brush them off as insignificant. Some take me a few days to figure out. Some actually require changing definitions, adding assumptions, or altering the statement of the theorem.

Most ‘mistakes’ are something like this:

  • surprisingly non-trivial arguments being declared as trivial/left to the reader

  • going very quickly and vaguely over part of the proof that is perceived as uninteresting and thereby missing a subtle flaw that would have become apparent if one had done it more thoroughly

  • missing cases that have probably been overlooked by the author

  • arithmetic mistakes (my favourite being multiplying an inequality with a constant and not checking that it is non-negative)

  • missing assumptions that are still implicitly used

Let me give you a few examples (I won't mention the exact authors; my intent is not to shame anybody for making these mistakes, only to show that this is quite common):

  1. I recently had a case where a theorem from a widely-used textbook from the 70s was simply plain wrong, which I realised when I wanted to find out how to formalise it. I am not an expert in that field, but apparently the people who do work in that field know that this is wrong.

  2. One of the first algorithms (working on non-deterministic automata) that I formalised apparently assumed that the automaton is total (i.e. it has an outgoing transitions for every letter in the alphabet from every state). In my opinion, that should have absolutely been mentioned in the paper, but one could possibly argue that that was just implicit in their idea of an automaton.

  3. A colleague of mine found a subtle problem with some complicated automata algorithm that had been used in state-of-the-art software for years. It is still not known if and how this problem can be fixed.

  4. In one instance, I had formalised a kind of program transformation technique from a published paper. The authors then extended that paper to a more detailed journal version and also added some new transformation rules. One of them dealt with multiplication with a constant, but they did not realise that multiplication with 0 is a special case that makes their rule unsound.

  5. I worked on formalising a new result that had just been published in a journal and found out that one part of the proof that the authors didn't explain in much detail due to page limits had a subtle problem that became only apparent when I had already formalised everything in Isabelle and got stuck at this part. The authors immediately admitted that this was a problem that could not be fixed in any apparent way except by adding an additional, somewhat technical assumption to the entire argument. However, they later managed to prove a stronger result that subsumes that result, but the proof of that was much more involved. (more details on this at the end of this answer)

  6. I don't remember the exact details about the Kepler conjecture that somebody mentioned before, but off the top of my head, I seem to recall that several smaller problems were found in the original program code, and Nipkow found one problem that actually caused Hales to revise a part of the proof completely.

As a theorem proving guy, my reaction to this is ‘This shows that formalising stuff in a theorem prover is worthwhile’. I am aware that mathematicians often have a different perspective. It is not an uncommon view to say that the so-called ‘mistakes’ I mentioned above are insignificant; that someone would have found them eventually even without a theorem prover; that the theorems were still correct (in some sense) and it was only the proofs that had some minor problems.

However, I disagree with that. I want my proofs to be as rigorous as possible. I want to be sure I didn't miss any assumptions. And I think that things like Kepler's conjecture show that there are instances where it is just infeasible for humans to check the correctness of a proof with a reasonable amount of certainty.

EDIT: As requested, some more details on point 5.

The paper in question is The impossibility of extending random dictatorship to weak preferences. They also published a corrigendum. The purpose of that paper is to show that no Social Decision Scheme (SDS) for at least 4 agents and alternatives exists that is an extension of Random Dictatorship (RD) and fulfils four nice properties.

It works by first showing that none exists for 4 agents and 4 alternatives and then shows that an SDS for more than 4 agents/alternatives can be turned into one for exactly 4/4 while preserving the nice properties, so that it cannot work for more than 4. Typically, in this kind of proof, the base case is the difficult one and lifting it to a larger number of agents/alternatives is pretty trivial. However, in this case, the property "the SDS is an extension of RD" does not survive the lifting to more agents/alternatives, which completely breaks that step. I myself only noticed that when I had already typed most of the proof into Isabelle and it just didn't go through.

The proof for the base case here was based on considering 12 particular preference profiles and, as you can see, relatively short. The authors then later found a proof for the same statement without the RD extension assumption, but that one needed 47 preference profiles and was much longer. I formalised that proof in Isabelle without any problems (see my BSc thesis).


Since this question was asked in January there have been some developments. I would like to argue that the scenario raised in the question has now actually happened. Indeed, Sébastien Gouëzel, when formalising Vladimir Shchur's work on bounds for the Morse Lemma for Gromov-hyperbolic spaces, found an actual inequality which was transposed at some point causing the proof (which had been published in 2013 in J. Funct. Anal., a good journal) to collapse. Gouëzel then worked with Shchur to find a new and correct (and in places far more complex) argument, which they wrote up as a joint paper.

http://www.math.sciences.univ-nantes.fr/~gouezel/articles/morse_lemma.pdf

The details are in the introduction. Anyone who reads it will see that this is not a "mistake" in the literature in the weak sense defined by Manuel Eberl's very clear answer -- this was an actual error which was discovered because of a formalization process.


This question was raised on the Foundations of Mathematics mailing list back in 2014, and the short answer is no, there are no examples of this. [EDIT: Although this may have been true at the time I originally wrote this, it is no longer true, as other answers amply demonstrate. But I think that this answer is worth leaving here nonetheless.]

The longer answer is that the process of formalizing any non-trivial mathematical argument is likely to reveal some minor gaps, such as degenerate cases for which the main argument doesn't quite work as stated. If you're sufficiently pedantic, then you might claim that in such cases, the original proof is "wrong," but I suspect that this is not the sort of thing you're asking for.

The Flyspeck project did turn up one gap in the original proof of the Kepler conjecture that was large enough that the authors felt the need to write a few pages of human explanation about it. There is also an interesting paper by Fleuriot and Paulson where they undertook to formalize Newton's Propositio Kepleriana with Isabelle using nonstandard analysis to implement Newton's use of infinitesimals. There was one step where Fleuriot and Paulson could not find a plausible way to imitate Newton's reasoning exactly and found themselves having to use a different argument. Again, it is debatable whether this means that Newton's proof was "wrong."