CS231n: How to calculate gradient for Softmax loss function?

I know this is late but here's my answer:

I'm assuming you are familiar with the cs231n Softmax loss function. We know that: enter image description here

So just as we did with the SVM loss function the gradients are as follows: enter image description here

Hope that helped.


Not sure if this helps, but:

y_i is really the indicator function y_i, as described here. This forms the expression (j == y[i]) in the code.

Also, the gradient of the loss with respect to the weights is:

y_i

where

y_i

which is the origin of the X[:,i] in the code.