Is there an intuitive explanation for an extremal property of Chebyshev polynomials?

Alex, I hope you will allow for the option that there are natural explanations for the Chebyshev polynomials which are unrelated to optimization on $[-1,1]$. Here is what I think is the most natural characterizing feature of this family of polynomials.

Theorem (Ritt): If $F_n(x)$ is a family of monic polynomials with coefficients in a field of characteristic 0 such that $\deg F_n(x) = n$ and $F_m(F_n(x)) = F_n(F_m(x))$ for all $m$ and $n$, then up to a simple change of variables $F_n(x) = x^n$ for all $n$ or $F_n(x) = (1/2^{n-1})T_n(x)$ for all $n$.

Proof: See Ritt, "Prime and Composite Polynomials," Trans. Amer. Math. Soc. 23 (1922), 51--66.

Also see the chapter "Commuting Polynomials" in Kvant Selecta: Algebra and Analysis Volume II.

The link provided in the original question mentions that the monic Chebyshev polnyomials commute, but doesn't emphasize that this is an incredibly special property of them. Pity.


Let's denote $\Pi_n$ the vector space of polynomial functions on I:=[-1,1] of degree less than or equal to n. The optimization problem you quoted is equivalent to:

Problem(2): Find $q\in\Pi_{n-1}$ that minimizes the uniform distance on I from the function $f(x)=x^n.$

While it is clear by compactness that Problem(2) has a solution, it's not obvious that the solution is unique, because the uniform norm is not uniformly convex (lack of unicity already appears in the analogous problem of point-line distance in $\mathbb{R}^2$ with the max norm). The magic of polynomials is that the solution is always unique, for any continuous function $f$. Not only, but Chebyshev also gave a characterization of the minimizer: it is the unique polynomial $q$ such that $f(x)-q(x)$ attains the maximum absolute value in at least n points, with alternate sign. (Polynomials are not the unique functions with this property; more generally one defines "Haar families", that span finite dimensional spaces analogous to the $\Pi_n$, and for them the argument works as well). Going back to your optimization problem, you have that $p(x)$ is the unique monic polynomial with all real simple zeros and that reaches the maximum absolute value with alternate sign between consecutive zeros. With a bit more work one arrives to $T_n/2^{n-1}$ (or, at this time, one pulls it out of the hat, but at this point I'd consider that quite more fair, and would have no objections, especially if the hat is the most noble one of Pafnuty Lvovich).

PS: Trying to answer the "psychology of mathematics" part of your question (how one arrives to the $T_n$ from the optimization problem). Once the problem is reduced to that of determining the n-th degree polynomial $T$ (say with positive leading coefficient) that oscillates n+1 times between it maximum value 1 and its minimum value -1 on the interval [-1,1], I guess that very soon one suspects that circles and trigonometric functions are around (I can't say it certainly, because I already know the answer!). But if one computes the first few such polynomials, and look at their graphs, they clearly look like sinusoidal curves drawn on a cylinder, and at that point one could recall from high school memories that cos(nx) is a trigonometric polynomial of cos(x), and observe it has clearly the wanted property.


Well, let's try to avoid the hat.

Consider the dual (and obviously equivalent) problem: find the polynomial $p(x):[-1,1]\rightarrow [-1,1]$ of degree $n$ with the greatest possible leading coefficient. We have some information on values of $p$, and need something about its coefficient. Let's try Lagrange's interpolation. Take some $n+1$ values $t_1 < t_2 < \dots < t_{n+1}$ from $[-1,1]$ and write down (for $u(x)=(x-t_1)\dots(x-t_{n+1})$) the formula $$ p(x)=\sum p(t_i) \frac{u(x)/(x-t_i)}{u'(t_i)}. $$ Then take a look on coefficient of $x^n$. It equals $$ \sum \frac{p(t_i)}{u'(t_i)}. $$ We know that $|p(t_i)|\leq 1$, so the leading coefficient does not exceed $ \sum 1/|u'(t_i)|. $ Ok, when does equality occur? The answer is: $p$ should take values $(-1)^{n-i+1}$ in $t_i$. That is, we have to find a polynomial of degree $n$ with $n+1$ extremal values $\pm 1$ on $[-1,1]$. This may hold only if $t_1=-1$, $t_{n+1}=1$, and $t_2$, $\dots$, $t_n$ are roots of $p'$. So, $1-p^2(x)$ should be divisible by $(1-x^2)p'(x)$. Hereon the trigonometic substitution $x=\cos t$, $p=\cos f$ is very natural, as we know that $1-f^2$ is divisible by $f'$ for $f=\cos$. So we invent Chebyshev's polynomials.

Also, it is seen from Lagrange formula that they are extremal in many other problems with restrictions $|p(x)|\leq 1$ on $[-1,1]$. For example, the value in each specific point $x_0>1$ is maximized also for Chebyshev polynomial, it is proved by exactly the same way.