What is the difference between the Jacobian, Hessian and the Gradient?

Some good resources on this would be any introductory vector calculus text. I'll try to be as consistent as I can be with Stewart's Calculus, perhaps the most popular calculus textbook in North America.

The Gradient

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a scalar field. The gradient, $\nabla f: \mathbb{R}^n \rightarrow \mathbb{R}^n$ is a vector, such that $(\nabla f)_j = \partial f/ \partial x_j$. Because every point in $\text{dom}(f)$ is mapped to a vector, then $\nabla f$ is a vector field.

The Jacobian

Let $\operatorname{F}: \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a vector field. The Jacobian can be considered as the derivative of a vector field. Considering each component of $\mbox{F}$ as a single function (like $f$ above), then the Jacobian is a matrix in which the $i^{th}$ row is the gradient of the $i^{th}$ component of $\operatorname{F}$. If $\mathbf{J}$ is the Jacobian, then

$$\mathbf{J}_{i,j} = \dfrac{\partial \operatorname{F}_i}{\partial x_j}$$

The Hessian

Simply, the Hessian is the matrix of second order mixed partials of a scalar field.

$$\mathbf{H}_{i, j}=\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}$$

In summation:

  • Gradient: Vector of first order derivatives of a scalar field

  • Jacobian: Matrix of gradients for components of a vector field

  • Hessian: Matrix of second order mixed partials of a scalar field.

Example

Squared error loss $f(\beta_0, \beta_1) = \sum_i (y_i - \beta_0 - \beta_1x_i)^2$ is a scalar field. We map every pair of coefficients to a loss value.

  • The gradient of this scalar field is $$\nabla f = \left< -2 \sum_i( y_i - \beta_0 - \beta_1x_i), -2\sum_i x_i(y_i - \beta_0 - \beta_1x_i) \right>$$

  • Now, each component of $\nabla f$ is itself a scalar field. Take gradients of those and set them to be rows of a matrix and you've got yourself the Jacobian

$$ \left[\begin{array}{cc} \sum_{i=1}^{n} 2 & \sum_{i=1}^{n} 2 x_{i} \\ \sum_{i=1}^{n} 2 x_{i} & \sum_{i=1}^{n} 2 x_{i}^{2} \end{array}\right]$$

  • The Hessian of $f$ is the same as the Jacobian of $\nabla f$. It would behoove you to prove this to yourself.

Resources: Calculus: Early Transcendentals by James Stewart, or earlier editions, as well as Wikipedia which is surprisingly good for these topics.


If you have a function that maps a 1D number to a 1D number, then you can take the derivative of it,

$f(x) = x^2, f'(x) = 2x$

If you have a function that maps a ND vector to a 1D number, then you take the gradient of it

$f(x) = x^Tx, \nabla f(x) = 2x, x = (x_1, x_2, \ldots, x_N)$

If you have a function that maps a ND vector to a ND vector, then you take the Jacobian of it.

$f(x_1, x_2) = \begin{bmatrix} x_1x_2^2 \\ x_1^2x_2\end{bmatrix}, J_f(x_1, x_2) = \begin{bmatrix} x_2^2 & 2x_1x_2 \\ x_1^2 & 2 x_1x_2\end{bmatrix}$

The Hessian is the Jacobian of the gradient of a function that maps from ND to 1D

So the gradient, Jacobian and Hessian are different operations for different functions. You literally cannot take the gradient of a ND $\to $ ND function. That's the difference.