Gradient is NOT the direction that points to the minimum or maximum

Although the gradient vector is defined at every point, it is really a local concept.

At any given point, it tells you the direction in which the function changes with the greatest rate. If you think of the function as height, then it gives the direction in which the ground is steepest.

As soon as you move an inch, the ground changes and the steepest direction changes.

Instead of the black zig-zag, you need an integral curve of the gradient vector field.


The gradient $\nabla f(x)$ points in the direction $u$ such that the directional derivative $D_u f(x)$ is as large as possible. You probably walk downhill in the direction of steepest descent, despite the fact that the lowest point on earth is the Dead Sea and you are probably walking in completely the wrong direction to reach it.

Edit:

Maybe walking down a hill is not a perfect analogy because it makes it seem like the "direction of steepest descent" should be a vector in $\mathbb R^3$, with a $z$ component as well as $x$ and $y$ components.

Perhaps a better analogy is a bug walking on a hot (painfully hot!) sidewalk. The bug moves in the direction of steepest descent (the direction in which temperature decreases most quickly), but the bug does not realize that the coolest spot on the sidewalk is ten meters in the opposite direction, where there is shade. Hopefully in this analogy it's clear that the temperature is a function $f(x,y)$, and the direction of steepest descent is a vector with an $x$ component and a $y$ component, but no $z$ component.


Your black lines are not gradient lines at all. The gradient should be perpendicular to the contour lines at every point. Even in an ellipsoidal valley, the gradient will not point to the lowest point, but it will point much closer to it than your picture indicates.

A function minimizer that follows the local gradient has to take a finite sized step in the direction of the gradient, then find the gradient at the new location to take the next step. Often evaluating the gradient is very expensive and you want to do it as few times as possible. One approach is then to follow the gradient from your current point as far as the function stops decreasing, then stop, evaluate the local gradient, and set off in that direction. If that is your strategy, each new direction will be at a right angle to the prior direction. If the new gradient were not perpendicular to the old direction of travel, you could decrease the function by moving farther or not so far in the old direction of travel. You only change direction when you are at a local minimum in the direction you are going.