Getting better at proofs

I'd like to second one part of Qiaochu Yuan's answer: the recommendation to read Polya's book. Unlike many other books I've seen (albeit none of the others recommended above), it actually does contain guidance on how to construct a proof "out of nothing".

And that's one problem with the "practise, practise, practise" mantra. Practise what? Where are the lists of similar-but-not-quite-identical things to prove to practise on? I can find lists of integrals to do and lists of matrices to solve, but it's hard coming up with lists of things to prove.

Of course, practise is correct. But just as with anything else in mathematics, there's guidelines to help get you started.

The first thing to realise is that reading others proofs is not guaranteed to give you any insight as to how the proof was developed. A proof is meant to convince someone of a result, so a proof points to the theorem (or whatever) and knowing how the proof was constructed does not (or at least, should not) lend any extra weight to our confidence in the theorem. Proofs can be written in this way, and when teaching we should make sure to present some proofs in this way, but to do it every time would be tedious.

So, what are the guidelines for constructing a proof? You'll probably get different answers from different mathematicians so these should be construed as being my opinion and not a(n attempt at a) definitive answer.

My recommendation is that you take the statement that you want to prove and apply the following steps to it as often as you can:

  1. Expand out unfamiliar terms.
  2. Replacing generic statements by statements about generic objects.
  3. Including implicit information.

Once you've done all that, the hope is that the proof will be much clearer.

Here's an example.

  1. Original statement:

    The composition of linear transformations is again linear.

  2. Replace generic statements:

    If $S$ and $T$ are two composable linear transformations then their composition, $S T$, is again linear.

    It is important to be precise here. The word "composable" could have been left out, as the statement only makes sense if $S$ and $T$ are composable, but until you are completely familiar with this kind of process, it is better to be overly precise than otherwise. In this case, leaving in the word "composable" reminds us that there is a restriction on the domains and codomains which will be useful later. (However, one has to draw the line somewhere: even the word "composable" is not quite enough since it leaves open the question as to whether it is $S T$ or $T S$!)

  3. Include implicit information:

    If $S \colon V \to W$ and $T \colon U \to V$ are linear transformations then $S T \colon U \to W$ is again linear.

    Here's where remembering that $S$ and $T$ are composable in the previous step helps keep things clear. As $S$ and $T$ are composable, we only need $3$ vector spaces. Then, since we explicitly have the vector spaces the fact that $S$ and $T$ are composable is plain, though some may prefer to keep that fact in the statement. Also, some may like to have the fact that $U$, $V$, and $W$ are vector spaces explicitly stated.

  4. Expand out definitions:

    If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$ for all $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.

    Note that I have been careful not to repeat myself with the newly introduced symbols. It would be technically alright to reuse $u_1$ and $u_2$ in place of $x_1$ and $x_2$ since these are local declarations (restricted by the phrases "for all ..."). However, humans are not good at differentiating between local and global declarations so it is best not to reuse symbols unless the scope is very clear.

  5. Replace generic statements:

    If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then whenever $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$, $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.

    Up to now, the rephrasing has not taken into account the fact that there is a conclusion and a hypothesis. This rephrasing modifies a part of the conclusion to turn it from a generic statement "$P(p)$ is true for all $p \in Q$" to a statement about a generic object "whenever $p \in Q$ then $P(p)$ is true". We do not do this for the similar statements in the hypothesis. This is because these two pieces are treated differently in the proof.

  6. Replace generic statements, and reorganise to bring choices to the fore:

    Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. Then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.

    In this form, the distinction between hypothesis and conclusion is all the clearer. Parts of the hypothesis use the word "Let", parts of the conclusion use the word "Then".

With this formulation, the proof essentially writes itself. With all it's gory details:

Proof

Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.[^quick] Then:

$$ S T(x_1 + \eta x_2) = S \big( T(x_1) + \eta T(x_2)\big) $$

using the hypothesis on $T$ as $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. So:

$$ S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2) $$

using the hypothesis on $S$ as $T(x_1), T(x_2) \in V$ and $\eta \in \mathbb{R}$. Hence the conclusion is true.

Notes: 1. This could be condensed, but the important thing here is how to find it, not what the final form should be. 2. Notice that I wrote "as $x_1, x_2 \in U$" rather than "with $u_1 = x_1$ and $u_2 = x_2$". This is partly style, and partly because in the statement of linearity, $u_1$ and $u_2$ are placeholders into which we put $x_1$ and $x_2$. So saying $u_1 = x_1$ is semantically incorrect as it equates a virtual vector with an actual vector. This is a very minor point, though.


Finally, I would like to disagree with one part of Qiaochu's answer. I actually like the imagery of a steel trap. A proof is a bit like a trap: we want to capture the theorem in a trap so that it can't wriggle out. We construct the proof so that there is no possibility of escape. Eventually, yes, we want the proof to be beautiful but when it's first constructed we just want it to do the job. Only once the theorem is caught can we spend a little time decorating the cage to make it look pretty and set it off to its best advantage. So build the trap because theorems can be dangerous! An escaped theorem can do untold damage, rampaging across the countryside, laying waste like an unchecked viking.

(Okay, not quite finally. The step-by-step proof above was taking from a page I wrote for my students on the nature of proof. The original can be found here.)


You get better at proofs the same way you get better at basketball or carpentry: lots and lots of practice. (In particular, like in basketball and carpentry, you can only get so far by reading books.) Of course, there's good practice and bad practice. For general experience writing and coming up with proofs I think the same kind of material that people use to prepare for Olympiad-style mathematics is very helpful. To that end, here are a few references you might find helpful:

  • Problem-Solving Strategies, Engel (somewhat high-level)
  • How to Solve It, Polya
  • The Art and Craft of Problem Solving, Zeitz

For writing proofs specific to a course (e.g. real analysis) you'll generally find that the same basic ideas are being used over and over again, and once you learn to recognize when these ideas will be useful your life will be much easier. But I think this is a hard skill to teach.

It doesn't help that many proofs in textbooks are written in a style that makes it nearly impossible to see how someone could have come up with the proof from first principles. This is an unfortunate tendency, and you should find a different textbook if this is too much of an issue. (Alternately, you should see if there are better proofs online, for example on someone's blog. Tim Gowers and Terence Tao are fond of writing up conceptual proofs of things, and more generally their blogs are a great source of insight into how mathematicians think.) Once you build up a little problem-solving skill, another way to fix this is to reprove things yourself. It helps if you can't remember what the proof in the textbook is.

But I'm serious about the practice. $10000$ hours and all that.

Edit: I also find this "steel trap" analogy very depressing. While it might seem that way in some textbooks, a properly presented proof should feel more like a poem.

Edit #2: I'd also like to mention that the general outline of Polya's advice is on Wikipedia.


It seems you are asking two questions. The first is how to get better at doing proofs, and the previous answers are better than I can do. The second is why to do them, which has not been addressed. From your questions on this site, you seem more an engineer than a mathematician. This is not a negative-I am educated as a physicist and practicing as an engineer(ing manager), but I enjoy proof-type math. It just is a different view.

In response to your question "How can I characterize the type of solution vector that comes out of a matrix?" Greg Graviton not only gave an answer of what matrices met your criterion, he proved that matrices that did not met his answer did not work. This is a major improvement as you know there aren't others out there.

Proofs are designed to make sure you covered all the loose ends. Ideally, each statement in a proof would be derived from the previous statements through an accepted rule of logic. But aside from the automated theorem provers, this is unachievable because it would make the proofs too long. So you only need to display enough steps to convince the audience it can be done. How big a leap is acceptable varies with the audience.

As physicists/engineers our numbers and functions are better behaved than the ones the mathematicians worry about. The mathematical world is not uniformly continuous, but ours is. As such, we interchange limits, integrals, derivatives, and sums freely.