Unaggregated gradients / gradients per example in tensorflow

To partly answer my own question after tinkering with this for a while. It appears that it is possible to manipulate gradients per example while still working in batch by doing the following:

  • Create a copy of tf.gradients() that accepts an extra tensor/placeholder with example-specific factors
  • Create a copy of _AggregatedGrads() and add a custom aggregation method that uses the example-specific factors
  • Call your custom tf.gradients function and give your loss as a list of slices:

custagg_gradients( ys=[cross_entropy[i] for i in xrange(batch_size)],
xs=variables.trainable_variables(), aggregation_method=CUSTOM, gradient_factors=gradient_factors )

But this will probably have the same complexity as doing individual passes per example, and I need to check if the gradients are correct :-).


tf.gradients returns the gradient with respect to the loss. This means that if your loss is a sum of per-example losses, then the gradient is also the sum of per-example loss gradients.

The summing up is implicit. For instance if you want to minimize the sum of squared norms of Wx-y errors, the gradient with respect to W is 2(WX-Y)X' where X is the batch of observations and Y is the batch of labels. You never explicitly form "per-example" gradients that you later sum up, so it's not a simple matter of removing some stage in the gradient pipeline.

A simple way to get k per-example loss gradients is to use batches of size 1 and do k passes. Ian Goodfellow wrote up how to get all k gradients in a single pass, for this you would need to specify gradients explicitly and not rely on tf.gradients method

Tags:

Tensorflow