Isabelle: Sledgehammer finds a proof but it fails

You ask a number of questions, but I'll take your title and the second paragraph as the essence of your main complaint, where I end up giving a long-winded answer which can be summarized with,

  • Sledgehammer is part of a three-pronged arsenal,
  • you becoming more experienced, with never ending experimentation, along with trial and error is the heuristic,
  • not using many of the proofs which Sledgehammer returns is a big part of using Sledgehammer, and
  • the minimize and preplay_timeout options can save you some time and frustration by automatically playing the proofs back, which gives you timing information, and sometimes shows that a found proof will fail.

Starting with your second paragraph, you say:

Often times I have the problem that Sledgehammer finds a proof. But then I try it, but the proof doesn't terminate. I guess Sledgehammer is one of most important parts of Isabelle,...

Sledgehammer is important, but I consider it part of a three-pronged arsenal, where the three parts would be:

  1. Detailed proof steps using natural deduction.
  2. Automatic proof methods, such as auto, simp, rule, etc. A big part of this would be creating your own simp rewrite rules, and learning to use theorems with rule and the myriad of other automatic proof methods.
  3. Sledgehammer calling automatic theorem provers (ATPs). Using steps 1 and 2, with experience, are used to set up Sledgehammer. Experience counts for a lot. You might use auto to simplify things so that Sledgehammer succeeds, but you might not use auto because it will expand formulas to where Sledgehammer has no chance of succeeding.

...but then it gets annoying if a proof fails.

So here, your expectations and my expectations for Sledgehammer diverge. These days, if I get annoyed, I get annoyed that I will have to work more than 30 seconds to prove a theorem. If I'm hugely disappointed that a particular Sledgehammer proof fails, it's because I've been trying to prove a theorem for hours or days without success.

Using Sledgehammer not to find proofs, but to find good proofs

Automation can sometimes alleviate frustration. Clicking on a Sledgehammer proof, only to find out that it fails, would be frustrating. Here is the way I currently use Sledgehammer, unless I start becoming desperate for a proof:

sledgehammer_params[minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
                    max_relevant=smart,provers="
  remote_vampire  metis  remote_satallax  z3_tptp  remote_e
  remote_e_tofof  spass  remote_e_sine    e        z3       yices
"]

The options minimize=smart and preplay_timeout=10 are related to Sledgehammer playing back proofs, after it finds them. Not using many of the proofs that Sledgehammer finds is a big part of using Sledgehammer, and proof playback is a big part of culling out proofs.

Myself, I don't deal much with Sledgehammer proofs that don't terminate, but that's probably because I'm selective to begin with.

My first criteria for a Sledgehammer proof is that it be reasonably fast, and so when Sledgehammer reports that it's found a proof that's greater than 3 seconds long, I don't even try using it, unless I'm desperate to find out whether a theorem can be proved.

The use of Sledgehammer for me usually goes like this:

  • State a theorem and see if I get lucky with Sledgehammer.
  • If Sledgehammer gives me a proof that's 30 milliseconds or less, then I consider that good proof, but I still experiment with try and the automated proof methods of section 9.4.4, page 208, of isar-ref.pdf. Many times I can get a proof down to 5ms or less.
  • A metis proof of total time over 100ms, I'm willing to work 30 minutes or more to try and get a faster proof.
  • A metis proof of 200ms to 500ms, I'll resort to everything I know to try and get it down to below 100ms, which many times means converting to a detailed proof.
  • A smt or metis proof of greater than 1 second I only consider good as a temporary proof.
  • A proof in the output panel that Sledgehammer reports as being greater than 3 seconds, I usually don't even try, because even if it ends up working, I'm going to have to work to find another proof anyway, so I'd rather spend my time up front trying to find a good proof.

The option 3 heuristic

You say,

So what about the third option. Are there any easy to follow heuristics that I can apply?

The heuristic is:

  • "as appropriate",

which is to say that the heuristic is "use Sledgehammer as part of a three-pronged arsenal".

The heuristic is also "read lots of tutorials and documentation so that you have lots of other things to use with Sledgehammer". Sledgehammer is powerful, but it's not infinitely powerful, and for some theorems, you can use your own simp rules to prove in 0ms with apply(simp) or apply(auto) what Sledgehammer will never prove.

For myself, I'm up to about 150 to 200 theorems, so the "as appropriate" has much more meaning to me that it used to have. Basically, you try and set up Sledgehammer the way it needs to be set up.

The way Sledgehammer needs to be set up will sometimes mean running auto or simp first, but sometimes not, because many times running auto or simp will doom Sledgehammer to failure.

But sometimes, you don't even want a metis proof from Sledgehammer, except as a preliminary proof until you can find a better proof, which, for me, generally means a faster proof using the automatic proof methods.

I'm no authority on Sledgehammer, but it seems Sledgehammer is good at matching up hypotheses and conclusions from old theorems, with hypotheses and conclusions being used for a new theorem. What it's not good at is proving formulas which I've greatly expanded by using simp and auto.

I continue with the long-winded heuristic that is Sledgehammer centric:

  • Use Sledgehammer to jump-start the proof process, by proving some theorems with Sledgehammer that you otherwise don't know how to prove.
  • Turn your theorems which are equivalencies into simp rewrite rules for use with automatic proof methods like simp, auto, fastforce, etc., as described in chapter 9 of tutorial.pdf.
  • Use some of your theorems for conditional rewrite rules for use with intro and rule.
  • The last two steps are used to completely solve a proof step or used to set up Sledgehammer "as appropriate". Sledgehammer never ceases to be useful, no matter how much you know, and it's extremely useful when you don't know much, but Sledgehammer alone is not the road to success.
  • If Sledgehammer can't prove a theorem, then resort to a detailed proof, starting with a bare-bones detailed proof. Sometimes, breaking up an if-and-only-if into two conditionals allows Sledgehammer to easily prove the two conditionals, when it couldn't prove the if-and-only-if.
  • After you've proved lots of stuff, go back and optimize your proofs. Sometimes, with all the rewrite rules you've created, simp and auto will magically prove things, and you will get rid of some metis proofs that Sledgehammer found for you. Sometimes, you'll use Sledgehammer to find a metis proof that's even faster.

Use this command to optimize timing:

ML_command "Toplevel.timing := true"

There's another SO post giving more detail about it.


Concerning your statement sledgehammer is one of the most important parts of Isabelle: You never need sledgehammer to succeed with a proof. But of course sledgehammer is very convenient and can save a lot of tedious reasoning. Thus it is definitely a very important part for making Isabelle more usable for people who did not spend many years using it (and even for those sledgehammer makes everyday proving more productive).

Coming to your question

Try the blast or auto proof methods, passing the necessary facts via unfolding, using, intro:, elim:, dest:, or simp:, as appropriate.
[...]
So what about [this] option. Are there any easy to follow heuristics that I can apply?

Indeed there are:

unfolding: This (recursively) unfolds equations, i.e., it is very similar to apply (simp only: ...). The heuristic is, when you do not get the expected result with simp: ... try unfolding ... instead (it might be the case that other equations are interfering).

using: This is used to add additional assumptions to the current subgoal. The heuristic is, whenever a fact does not fit one of the patterns below, try using instead.

intro:: This is used for introduction rules, i.e., of the form that whenever certain assumptions are satisfied some connective (or more generally constant) may be introduced.
Example: A ==> B ==> A & B (where the introduced constant is (&)).

elim:: This is used for elimination rules, i.e., of the form that from the presence of a certain connective (or more generally constant) some facts may be concluded as additional assumptions.
Example: A & B ==> (A ==> B ==> P) ==> P (where the constant (&) is eliminated in favor of explicitly having A and B as assumptions). Note the general form of the conclusion (which is not related to the major premise A & B), this is important to not loose provability (see also dest:).

dest:: This is used for destruction rules, i.e., of the form that from the presence of a certain constant some facts may be concluded directly.
Example: A & B ==> B (Note that the information that A holds is lost in the conclusion, unlike in the elim: example.)

simp:: This is used for simplification rules, i.e., (conditional) equations, which are always applied from left to right (thus it is sometimes useful to add [symmetric] to a fact, in order to apply it from right to left, but beware of nontermination, for it is easy to introduce looping derivations in this way).

Having said this, often it is just experience that lets you decide in which way best to employ a given fact inside a proof. What I usually do when I got a proof by sledgehammer which is too slow in Isar is to inspect the facts that where used by the found proof. Then categorize them as above, invoke auto appropriately and if that did not completely solve the goal, apply sledgehammer once more (hopefully delivering an "easier" proof this time).


I can answer your subquestion "What's the difference between unfolding and using?". Roughly speaking, it works like this.

Suppose lemma foo is of the form x = a+b+c. If you write

unfolding foo

in your proof, then all occurrences of x will be replaced with a+b+c. On the other hand, if you write

using foo

then x=a+b+c will be added to your list of assumptions.