Gradient of the TV norm of an image

Let $f$ have 3 indices corresponding to voxels: $f_{ijk}$, with maximal indices $i_\text{max}, j_\text{max}, k_\text{max}$. The gradient of $f$ has an additional Cartesian index $\alpha$: \begin{align} g^\alpha &=\left(\nabla f\right)^\alpha=\partial_\alpha f\\ g^\alpha_{ijk} & = \partial_\alpha f_{ijk}. \end{align}

The TV norm is the sum of the 2-norms of this quantity with respect to Cartesian indices:

\begin{align} \lVert f \rVert_\text{TV}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(g^\alpha_{ijk}\right)^2}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{ijk}\right)^2}, \end{align} which is a scalar.

Now, consider the gradient of this quantity (in essence a scalar field over an $i_\text{max}\cdot j_\text{max}\cdot k_\text{max}$-dimensional field) with respect to voxel intensity components (I think this is a bit like a functional derivative, since we're differentiating with respect to $f_{ijk}$ rather than $i$ or $j$ or $k$). The result is a 3-index quantity, one for each $f_{ijk}$: \begin{align} \left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&= \frac{\partial}{\partial f_{ijk}} \lVert f \rVert_\text{TV} = \partial_{f_{ijk}} \sum\limits_{i^\prime j^\prime k^\prime} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}\\ &=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}. \end{align}

Now this gets a bit tricky, since $\partial_\alpha$ and $\partial_{f_{ijk}}$ don't seem to commute: the former, a derivative with respect to Cartesian indices, looks something like this: \begin{align} \partial_x f_{i^\prime j^\prime k^\prime} = \lim\limits_{h\to 0} \frac{f_{i^\prime+h,j^\prime,k^\prime}-f_{i^\prime-h, j^\prime, k^\prime}}{2h} \end{align} with an analytic continuation of the discrete indices;) My point is that the derivative with respect to $f$ is non-trivial to act on this.

So I'll consider a very special case: discretization with central differences (update: we actually need forward or backward differences instead, see remarks at the end), using, for instance, gradient() in MATLAB. In this case, \begin{align} \partial_\alpha f_{i^\prime j^\prime k^\prime}=\delta_{\alpha x} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2} + \delta_{\alpha y} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2} + \delta_{\alpha z} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime ,j^\prime, k^\prime-1}}{2} \end{align} where I'm just trying to say that each Cartesian derivative is related to 2 second-neighbour $f$ components along the given Cartesian axis.

Let's first compute the derivative of the gradient with respect to $f$. We could do it based on the previous equation, and looking at Kronecker deltas like $\delta_{\alpha x}$, but it's probably clearer if we treat the 3 Cartesian axes separately. Since we're differentiating with respect to $f_{ijk}$, we have to consider each component of $f$ as individual variables. So when computing $\partial_{f_{ijk}} f_{i^\prime,j^\prime,k^\prime}$, this quantity will be zero, unless $i=i^\prime \wedge j=j^\prime \wedge k=k^\prime$, and in this case the derivative is simply 1. This leads to \begin{align} \partial_{f_{ijk}} \partial_x f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2}=\frac{\delta_{i^\prime+1,i}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime-1,i}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i-1}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime,i+1}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\\ % \partial_{f_{ijk}} \partial_y f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime+1,j}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime-1,j}\delta_{k^\prime,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j-1}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime,j+1}\delta_{k^\prime,k}}{2}\\ % \partial_{f_{ijk}} \partial_z f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime, j^\prime, k^\prime-1}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime+1,k} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime-1,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k-1} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k+1}}{2}. \end{align}

These terms will essentially select those $i^\prime,j^\prime,k^\prime$ indices from the sum on the right hand side of $\nabla_f \lVert f\rVert_\text{TV}$, which match the given indices $i\pm1,j\pm1,k\pm1$. We have to be careful though, since we shouldn't have indices like $i^\prime=0$ or $i^\prime=i_\text{max}+1$, for which we need additional Kronecker deltas (such as $\left(1-\delta_{i,1}\right)$ and $\left(1-\delta_{i,i_\text{max}}\right)$, respectively).

So here's the deal: \begin{align} \left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\notag\\ &=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\partial_x f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_x f_{i^\prime j^\prime k^\prime}\right) + \partial_y f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_y f_{i^\prime j^\prime k^\prime}\right) + \partial_z f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_z f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\\ &=\frac{\left(1-\delta_{i,1}\right) \partial_x f_{i-1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i-1, j, k}\right)^2}} - \frac{\left(1-\delta_{i,i_\text{max}}\right) \partial_x f_{i+1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i+1, j, k}\right)^2}}\notag\\ &\quad + \frac{\left(1-\delta_{j,1}\right) \partial_y f_{i,j-1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j-1, k}\right)^2}} - \frac{\left(1-\delta_{j,j_\text{max}}\right) \partial_y f_{i,j+1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j+1, k}\right)^2}}\notag\\ &\quad + \frac{\left(1-\delta_{k,1}\right) \partial_z f_{i,j,k-1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k-1}\right)^2}} - \frac{\left(1-\delta_{k,k_\text{max}}\right) \partial_z f_{i,j,k+1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k+1}\right)^2}} \end{align} which seems to be the end result. Again, the Kronecker deltas in the numerators essentially just ensure that we don't violate the array bounds of $f$ in the numerator and denominator.


As @Ander noted early on, the exact procedure above doesn't actually work, and one has to use forward or backward differences instead (of course the procedure is the exact same, and the minor index changes can be straightforwardly handled). He later figured out that the reason for this is that central differences are not sensitive to the local pixel value, so using it will not lead to the minimization of the TV norm (imagining an image with a staggered binary pattern, the central-difference gradient would be zero everywhere, despite the image markedly not being uniform).


Since the link to the paper was deleted. I hereby copy/paste the process the paper used. One could still search the name of the paper and find it easily.

https://arxiv.org/abs/0904.4495 "Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT" (by Emil Y. Sidky, Chien-Min Kao, Xiaochuan Pan)

The result should be essentially the same as Andras Deak's. Personally I prefer less equations :)

Here we use the backward difference mode, and the index is (s,t) instead of (i,j) only to stick to the paper's notation.

$ ||f_{s,t}||_{TV} =\sum_{s,t} |∇f_{s,t}| =\sum_{s,t}\sqrt{(f_{s,t} − f_{s−1,t})^2 + (f_{s,t} − f_{s,t−1})^2}. $

The two terms under the $\sqrt\ $ is the image pixel difference in the row and column direction. In 3D case, there would the third term in Z direction.

To calculate $v_{s,t} =\frac{∂||f||_{TV}}{∂f_{s,t}}, $ there are three terms containing index {s,t}:

(1) $\sqrt{(f_{s,t} − f_{s−1,t})^2 + (f_{s,t} − f_{s,t−1})^2} $

(2) $\sqrt{(f_{s+1,t} − f_{s,t})^2 + (f_{s+1,t} − f_{s+1,t-1})^2}$

(3) $\sqrt{(f_{s,t+1} − f_{s-1,t+1})^2 + (f_{s,t+1} − f_{s,t})^2}$

Or you can use this picture as reference backward difference for terms evolving (s,t), where highlighted by red color for term (1), green for term (2), and blue for term (3).

The partial differential $\frac{\partial}{\partial{f_{s,t}}}$for them are:

(a) $ \frac{2(f_{s,t}-f_{s-1,t})+2(f_{s,t}-f_{s,t-1})}{\sqrt{(f_{s,t} − f_{s−1,t})^2 + (f_{s,t} − f_{s,t−1})^2}},$

(b) $ \frac{-2(f_{s+1,t} − f_{s,t})}{\sqrt{(f_{s+1,t} − f_{s,t})^2 + (f_{s+1,t} − f_{s+1,t-1})^2}},$

(c) $ \frac{-2(f_{s,t+1} − f_{s,t})}{\sqrt{(f_{s,t+1} − f_{s-1,t+1})^2 + (f_{s,t+1} − f_{s,t})^2}},$

respectively.

$v_{s,t} =\frac{∂||f||_{TV}}{∂f_{s,t}} $ is the summation of the term (a), (b) and (c).


I know the question is old and closed, but I faced the same problem and this question helped me. In the end I came up with a different derivation, which I share below.

Let $\mathbf{f}\in\mathbb{R}^{n_1 \cdots n_m}$ be a vector (e.g. when $m=2$, a linearized black and white image). We define $TV(\mathbf{f})$ by

$$TV(\mathbf{f}) = \sum\limits_{i=1}^{n_1 \cdots n_m} [|\nabla \mathbf{f}|]_i.$$

I am abusing notation somewhat, and what I mean by the weird gradient is the following:

$$([|\nabla \mathbf{f}|]_i)^2 = ([D^1 \mathbf{f}]_i)^2 + ... + ([D^m \mathbf{f}]_i)^2$$

$i=1,\ldots,n_1 \cdots n_m$, where $D^\ell, \ell=1,\ldots,m$ is the discrete linear operator (e.g. forward difference) in the $\ell$-th direction. In a black and white image, they would be $D^x$ and $D^y$, for example. Therefore, $|\nabla \mathbf{f}|\in\mathbb{R}^{n_1 \cdots n_m}$ represents a vector like $\mathbf{f},$ whose coordinates are the norm of the discrete local gradient of $\mathbf{f}$.

With this understanding, let's compute the gradient of the $TV$ norm:

$$\partial_j TV(\mathbf{f}) = \partial_j \sum\limits_{i=1}^{n_1 \cdots n_m} \sqrt{([D^1 \mathbf{f}]_i)^2 + ... + ([D^m \mathbf{f}]_i)^2}$$

$$\partial_j TV(\mathbf{m}) = \partial_j \sum\limits_{i=1}^{n_1 \cdots n_m} \sqrt{\left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} f_k\right)^2 + ... + \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} f_k\right)^2}$$

$$\partial_j TV(\mathbf{m}) = \sum\limits_{i=1}^{n_1 \cdots n_m} \frac{\partial_j\left(\left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} f_k\right)^2 + ... + \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} f_k\right)^2\right)}{2\sqrt{\sum\limits_{k=1}^{n_1 \cdots n_m}(D^1_{ik} f_k)^2 + ... + \sum\limits_{k=1}^{n_1 \cdots n_m}(D^m_{ik} f_k)^2}}$$

$$\partial_j TV(\mathbf{m}) = \sum\limits_{i=1}^{n_1 \cdots n_m} \frac{\left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} f_k\right) \partial_j \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} f_k\right) + ... + \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} f_k\right) \partial_j \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} f_k\right)}{[|\nabla \mathbf{f}|]_i}$$

Since $\partial_{j}f_k = \delta_{jk}$, where $\delta_{jk}$ is the Kronecker delta, we have

$$\partial_j TV(\mathbf{m}) = \sum\limits_{i=1}^{n_1 \cdots n_m} \frac{\left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} f_k\right) \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^1_{ik} \delta_{jk}\right) + ... + \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} f_k\right) \left(\sum\limits_{k=1}^{n_1 \cdots n_m}D^m_{ik} \delta_{jk}\right)}{[|\nabla \mathbf{f}|]_i}$$

$$\partial_j TV(\mathbf{m}) = \sum\limits_{i=1}^{n_1 \cdots n_m} \frac{[D^1 \mathbf{f}]_i}{[|\nabla \mathbf{f}|]_i} D^1_{ij} + ... + \frac{[D^m \mathbf{f}]_i}{[|\nabla \mathbf{f}|]_i} D^m_{ij}$$

$$\partial_j TV(\mathbf{m}) = \left[(D^1)^T\frac{D^1 \mathbf{f}}{|\nabla \mathbf{f}|}\right]_j + ... + \left[(D^m)^T\frac{D^m \mathbf{f}}{|\nabla \mathbf{f}|}\right]_j$$

with some abuse in notation, we obtain

$$\partial_j TV(\mathbf{m}) = \left[ ((D^1)^T, \ldots, (D^m)^T) \cdot \left(\frac{D^1 \mathbf{f}}{|\nabla \mathbf{f}|}, \ldots, \frac{D^m \mathbf{f}}{|\nabla \mathbf{f}|}\right)\right]_j$$

or even

$$\nabla TV(\mathbf{m}) = ((D^1)^T, \ldots, (D^m)^T) \cdot \left(\frac{D^1 \mathbf{f}}{|\nabla \mathbf{f}|}, \ldots, \frac{D^m \mathbf{f}}{|\nabla \mathbf{f}|}\right)$$

This is very similar to the Fréchet derivative of the continuous functional, apart from the sign:

$$TV(m) = -\nabla \cdot\left(\frac{\nabla m}{|\nabla m|}\right)$$

This sign is implicit in the transposition of the derivative operators. In the continuous case, the gradient and the negative divergence are adjoints of each other, the derivative and its negative are also adjoints. In the discrete setting, $D_{CD}^i = -(D_{CD}^i)^T$ using central differences and $D^i_\text{FD} = -(D^i)^T_\text{BD}$ for forward and backward differences (with symmetric boundary conditions).

I have written some code in Julia, in which I have implemented this in a geophysical inversion setting, if you are interested!