Getting the current learning rate from a tf.train.AdamOptimizer

All the optimizers have a private variable that holds the value of a learning rate.

In adagrad and gradient descent it is called self._learning_rate. In adam it is self._lr.

So you will just need to print sess.run(optimizer._lr) to get this value. Sess.run is needed because they are tensors.


In Tensorflow 2:

optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.1)  # or any other optimizer
print(optimizer.learning_rate.numpy())  # or print(optimizer.lr.numpy())

Note: This gives you the base learning rate. Refer to this answer for more details on adaptive learning rates.


I think the easiest thing you can do is subclass the optimizer.

It has several methods, that I guess get dispatched to based on variable type. Regular Dense variables seem to go through _apply_dense. This solution won't work for sparse or other things.

If you look at the implementation you can see that it's storing the m and t EMAs in these "slots". So something like this seems do it:

class MyAdam(tf.train.AdamOptimizer):
    def _apply_dense(self, grad, var):
        m = self.get_slot(var, "m")
        v = self.get_slot(var, "v")

        m_hat = m/(1-self._beta1_power)
        v_hat = v/(1-self._beta2_power)

        step = m_hat/(v_hat**0.5 + self._epsilon_t)

        # Use a histogram summary to monitor it during training.
        tf.summary.histogram("hist", step) 

        return super(MyAdam,self)._apply_dense(grad, var)

step here will be in the interval [-1,1], that's what gets multiplied by the learning rate, to determines the actual step applied to the parameters.

There's often no node in the graph for it because there is one big training_ops.apply_adam that does everything.

Here I'm just creating a histogram summary from it. But you could stick it in a dictionary attached to the object and read it later or do whatever you want with it.

Droping that into mnist_deep.py, and adding some summaries to the training loop:

all_summaries = tf.summary.merge_all()  
file_writer = tf.summary.FileWriter("/tmp/Adam")
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(20000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy,summaries = sess.run(
                [accuracy,all_summaries],
                feed_dict={x: batch[0], y_: batch[1], 
                           keep_prob: 1.0})
            file_writer.add_summary(summaries, i)
            print('step %d, training accuracy %g' % (i, train_accuracy))
       train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

Produces the following figure in TensorBoard:

TensorBoard Histogram tab, showing 8 different histogram plots


Sung Kim suggestion worked for me, my exact steps were:

lr = 0.1
step_rate = 1000
decay = 0.95

global_step = tf.Variable(0, trainable=False)
increment_global_step = tf.assign(global_step, global_step + 1)
learning_rate = tf.train.exponential_decay(lr, global_step, step_rate, decay, staircase=True)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.01)
trainer = optimizer.minimize(loss_function)

# Some code here

print('Learning rate: %f' % (sess.run(trainer ._lr)))

Tags:

Tensorflow