Formal proof that mean minimize squared error function

Assume $X_0$ is a space of "values" to optimize, and so the values are constant. Note that $\sum_{k=1}^n \|x_k− m\|^2$ is constant because it does not depend of $x_0$ ($x_k$ and $m$ are calculated from $X_0$). So in the last line of the proof you have $J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2$, i.e., $$J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + C,$$ for some non-negative integer $C \ge 0$. Thus, you need to minimize $\sum_{k=1}^n \|x_0 - m \|^2$. Clearly, it is minimized when $x_0 = m$.

Edit 1. The term $2(x_0-m)^T\sum_{k=1}^n(x_k-m)$ vanishes because $$\sum_{k=1}^n (x_k - m) = \sum_{k=1}^n x_k - \sum_{k=1}^n m = \sum_{k=1}^n x_k - m \times n.$$ Since $m = \frac{1}{n} \sum_{k=1}^n x_k$ we have $$ \sum_{k=1}^n x_k = m \times n. $$ Thus $$ \begin{array}{rcl} 2(x_0-m)^T\sum_{k=1}^n(x_k-m) & = & 2(x_0-m)^T(m \times n - m \times n) \\ & = & 2(x_0-m)^T \times 0 \\ & = & 0. \end{array} $$

Edit 2. If $x_0 = v$ and $v \ne m$ we obtain the same. $$ \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - v)-(x_k - v) \|^2 \\ & = & \sum_{k=1}^n \|x_0 - v \|^2 -2(x_0-v)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - v \|^2. \end{array} $$ The cross term vanishes when $x_0 = v$, so \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|x_0 - x_0 \|^2 -2(x_0-x_0)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & 0 + 0 + \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & \sum_{k=1}^n \|x_0 - x_k \|^2 . \end{array}

Your proof is a proof of existence. This type of proofs can be done picking some value $m$ and proving that it satisfies the claim, but it does not prove the uniqueness, so one can imagine that there is another value which satisfies the claim. Your proof does not prove the uniqueness, maybe because this is "clearly".

Aditional. Simplifying your problem, I will assume that $X_0$ is a scalar space (a collection of numbers), to give a alternative proof.

If we have $J_0(x_0) = \sum_{k=1}^n (x_0 - x_k)^2$, to minimize we derive respect to $x_0$, thus

$$ \begin{array}{rcl} \frac{\partial}{\partial x_0} J_0(x_0) & = & \frac{\partial}{\partial x_0} \sum_{k=1}^n (x_0 - x_k)^2 \\ & = & 2 \times \sum_{k=1}^n (x_0 - x_k) \\ & = & 2 \times \left(\sum_{k=1}^n x_0 - \sum_{k=1}^n x_k\right) \\ & = & 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right). \end{array} $$

Now to minize $J_0(x_0)$, we need that $\partial J_0(x_0) /\partial x_0 = 0$. Thus \begin{array}{rcl} 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right) & = & 0 \\ n x_0 & = & \sum_{k=1}^n x_k \\ x_0 & = & \frac{1}{n} \sum_{k=1}^n x_k \\ x_0 & = & m. \qquad\qquad \Box \end{array}

To obtain tha same result for vectors you need to use the gradient and proceed similarly. Surely, you will learn later about normal equations.

I believe that this expression is constant. The only variable anywhere in sight is $x_0$. You are supposed to pick $x_0$ that minimizes $J_0(x_0)$. The only expression involving $x_0$ is nonnegative, so it is minimized if it is zero. Choosing $x_0=m$ does this because $$J_0(x_0) = \underbrace{\sum_{k=1}^n \|x_0 - m \|^2}_{\textrm{zero when }x_0=m} + \underbrace{\sum_{k=1}^n \|x_k - m \|^2}_{\textrm{constant w.r.t. }x_0} $$

Formal proof that mean minimize squared error function

Tags:

Machine Learning

Statistics

Linear Algebra

Related

Recent Posts