Why the $\nabla f(x)$ in the direction orthogonal to $f(x)$?

The correct statement is that the gradient vector $\nabla f(x_0,y_0)=\left(\frac{\partial}{\partial x} f(x_0,y_0),\frac{\partial}{\partial y} f(x_0,y_0) \right)$ is orthogonal to any level curve $\mathbf{\alpha(t)=(x(t),y(t))}$ of the function $\mathbf{ f} $ in point $\alpha(t_0)=(x_0,y_0)$.

That is, the gradient vector $\nabla f(x_0,y_0)=\left(\frac{\partial}{\partial x} f(x_0,y_0),\frac{\partial}{\partial y} f(x_0,y_0) \right)$ of $f$ in point $(x_0,y_0)$ is orthogonal to the tangent vector $\alpha'(t_0)=\big(x'(t_0),y'(t_0)\big)$.

enter image description here

Let us explain this in a more detailed way. This in fact is the statement of a theorem of calculus in several variables.

Theorem. Let $f:\Omega\subset\mathbb{R}^2\to \mathbb{R}$ be a differentiable function in a open $\Omega\subset \mathbb{R}^2$ and suppose that all partial derivatives of $f$ are continuous. Consider a curve $\alpha:I\to \mathbb{R}^2$ differentiable in interval $I=(a,b)$ and living in
$$ f^{-1}(c)=\{ (x,y)\in\Omega : f(x,y)=c\}. $$ A curve with this property is called a level curve in $c$. Suppose that the curve $\alpha(t)=(x(t),y(t))$ have coordinates $x(t)$ and $y(t)$ whose derivatives $x'(t)$ and $y'(t)$ are continuous. For all $t_0\in I$ and $\alpha(t_0)=(x_0,y_0)$, $$ \langle \nabla f(x_0,y_0), \alpha'(t_0)\rangle = \frac{\partial}{\partial x} f(x_0,y_0)\cdot x'(t_0)+\frac{\partial}{\partial y} f(x_0,y_0)\cdot y'(t_0)=0 $$

Proof. The function $(a,b)\ni t\mapsto f\circ \alpha(t)= f\big(x(t),y(t)\big)\in\mathbb{R}$ is constant equal to $c$, that is, $f\circ \alpha(t)= f\big(x(t),y(t)\big)=c$. Looking $f\circ \alpha(t)$ as a function in the variable $t$ we know that $$ \frac{d}{dt}(f\circ \alpha)(t)= \frac{d}{dt} f\big(x(t),y(t)\big)=0. $$ On the other hand, by the chain rule and gradient property we have that $$ \left.\frac{d}{dt}(f\circ \alpha)(t)\right|_{t=t_0} = \frac{\partial}{\partial \vec{v}}f(x,y) = \langle \nabla f(x_0,y_0), \vec{v}\rangle = \langle \nabla f(\alpha(t_0)), \alpha'(t_0)\rangle $$ for $\alpha'(t_0)=(x'(t_0),y'(t_0))=(v_1,v_2)=\vec{v}$. Then $$ \langle \nabla f(x_0,y_0), \alpha'(t_0)\rangle = \frac{\partial}{\partial x} f(x_0,y_0)\cdot x'(t_0)+\frac{\partial}{\partial y} f(x_0,y_0)\cdot y'(t_0)=0 $$


The vector field $\operatorname{grad}(f)$ realizes the total derivative of $f$, in the sense that the inner product of $\operatorname{grad}(f)$ and a vector $v$ gives $df(v)$. But if $v$ is parallel to a level curve, then $df(v) = 0$, so it is perpendicular to the gradient.

Let $f:M\to \mathbb{R}$ be a smooth function with $n_0$ a regular value. Equip $M$ with a Riemannian metric. Suppose $f(m) = n_0$ and $v\in T_mM$ is parallel to the level set $f^{-1}(n_0)$. Then the trick is:

$$\langle \operatorname{grad}(f),v\rangle = df(v) = 0$$

because $v$ is annihilated by $df$.

So in a very real sense, this is a defining property of the gradient. We've constructed the gradient so that it is perpendicular to level sets.


Let's take an example to illustrate. Let $f(\boldsymbol{x}) = f(x,y)$. You may think $z = f(x,y)$ and you can plot the graph of the function in $\mathbb{R}^3$.

Now take a contour of $f(x,y)$, i.e. $f(x,y) = c$, total derivative shows $$\mathrm{d}f = \frac{\partial f}{\partial x}\mathrm{d}x+ \frac{\partial f}{\partial y}\mathrm{d}y = 0,$$ you can think $\mathrm{d}x, \mathrm{d}y$ as infinitesimal increments of $x,y$. With notice the above equation can be written as $$\nabla f\cdot\mathrm{d}\boldsymbol{x} = \begin{bmatrix} \frac{\partial f}{\partial x}\\ \frac{\partial f}{\partial y} \end{bmatrix} \cdot \begin{bmatrix} \mathrm{d} x\\ \mathrm{d} y \end{bmatrix} =0,$$ we have shown that $\nabla f(x,y)$ is orthogonal to $f(x,y) = c $ for all $(x,y)$ and $c$.

Let $\boldsymbol{v}$ be a unit vector, $\nabla f\cdot \boldsymbol{v}$ shows how $f$ would change in that direction. Notice $|\nabla f\cdot \boldsymbol{v}|\leq |\nabla f||\boldsymbol{v}| = |\nabla f| $ and equality only holds for $\boldsymbol{u} = \nabla f/|\nabla f|$. (You can argue what happen if $|\nabla f| = 0$, i.e. a plane $z = const.$)

So $\nabla f$ offers the information on how much and in which direction $f$ changes the most.