The derivative of the Cholesky Factor

The derivative can be found via implicit differentiation. That is, $$ \frac{\mathrm{d}\operatorname{vec}\left(Y\right)}{\mathrm{d}\operatorname{vec}\left(X\right)} = \left(\frac{\mathrm{d} \operatorname{vec}\left(X\right)}{\mathrm{d}\operatorname{vec}\left(Y\right)}\right)^{-1}.$$ It is relatively easy to compute the derivative of $A$ with respect to $f(A)$ since $A = f(A)f(A)^{\top}$. The only trick part is restricting $f(A)$ to be lower triangular.

For general $X$, we have $$ \frac{\mathrm{d} \operatorname{vec}\left(XX^{\top}\right)}{\mathrm{d} \operatorname{vec}\left(X\right)} = \left(I + K\right)\left(X\otimes I\right),$$ where $K$ is the Commutation Matrix.

Now to get the derivative with respect to the $\operatorname{vech}$ requires use of the chain rule. This gives $$ \frac{\mathrm{d} \operatorname{vech}\left(XX^{\top}\right)}{\mathrm{d} \operatorname{vech}_{\Delta}\left(X\right)} = L \left(I + K\right)\left(X\otimes I\right) D,$$ where here $L$ is the elimination matrix, and $D$ is the "lower triangular duplication matrix" which has the property that $D \operatorname{vech}\left(M\right) = \operatorname{vec}\left(M\right)$ for lower triangular matrices $M$. The sought derivative is the matrix inverse of the above expression.

numerical confirmation:

Here is a numerical confirmation in R: (note that the chol function in R is an operator from upper triangular matrices to upper triangular matrices, thus some mucking about with transposes):

require(matrixcalc)
set.seed(2349024)
n <- 6
X <- cov(matrix(rnorm(1000*n),ncol=n))
fnc <- function(X) t(chol(X))

Y <- fnc(X)
d0 <- (diag(1,nrow=n^2) + commutation.matrix(r=n)) %*% (Y %x% diag(1,nrow=n))
L <- elimination.matrix(n)
d1 <- L %*% d0 %*% t(L)
dfin <- solve(d1)

# now compute the approximate derivative
apx.d <- matrix(rep(NA,length(dfin)),nrow=dim(dfin)[1])
my.eps <- 1e-6
low.idx <- which(lower.tri(diag(1,n),diag=TRUE))
for (iii in c(1:length(low.idx))) {
    Xalt <- X
    tweak <- low.idx[iii]
    Xalt[tweak] <- Xalt[tweak] + my.eps
    # "Note that only the upper triangular part of 'x' is used..."
    Yalt <- fnc(t(Xalt))
    dY <- (Yalt - Y) / my.eps
    apx.d[,iii] <- dY[low.idx]
}
apx.error <- apx.d - dfin
max(abs(apx.error))
apx.error

The maximum absolute error I get is 5.606e-07, on the order of the delta in the input variable, 1e-06.


I've written a relevant note on arXiv: http://arxiv.org/abs/1602.07527

I included the neat closed form solution pete gives in a comment, and also a messy expression (converted to the notation f = chol(A)): $$ \frac{\partial f_{ij}}{\partial A_{kl}} = \bigg(\sum_{m>j} f_{im}f_{mk}^{-1} + \tfrac{1}{2}f_{ij}f_{jk}^{-1}\bigg)f_{jl}^{-1} + (1-\delta_{kl})\bigg(\sum_{m>j} f_{im}f_{ml}^{-1} + \tfrac{1}{2}f_{ij}f_{jl}^{-1}\bigg)f_{jk}^{-1}. $$

However, if you're interested in differentiating a larger expression, you can do that in $O(N^3)$, without computing all $O(N^4)$ derivatives in $\frac{\partial \mathrm{vech}(f)}{\partial \mathrm{vech}A}$. The note explains different ways to do that.

(pete: if you tell me who you are, I'll add a proper acknowledgement to my note in any future revision.)