Is tf.contrib.layers.fully_connected() behavior change between tensorflow 1.3 and 1.4 an issue?

So here's the breakdown. The problem, somewhat surprisingly, is caused by tf.contrib.layers.flatten() because it changes the random seed differently in the different versions. There are two ways to seed the random number generator in Tensorflow, either you seed it for the whole graph with tf.set_random_seed() or you can specify a seed argument where it makes sense. As per the docs on tf.set_random_seed(), note point 2:

Operations that rely on a random seed actually derive it from two seeds: the graph-level and operation-level seeds. This sets the graph-level seed.

Its interactions with operation-level seeds is as follows:

  1. If neither the graph-level nor the operation seed is set: A random seed is used for this op.
  2. If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.
  3. If the graph-level seed is not set, but the operation seed is set: A default graph-level seed and the specified operation seed are used to determine the random sequence.
  4. If both the graph-level and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.

In our case the seed is set at the graph level, and Tensorflow does some deterministic calculation to calculate the actual seed to use in the operation. This calculation apparently depends on the number of operations as well.

In addition, the implementation of tf.contrib.layers.flatten() has changed exactly between the versions 1.3 and 1.4. You can look it up in the repository, but basically the code was simplified and moved from tensorflow/contrib/layers/python/layers/layers.py into tensorflow/tensorflow/python/layers/core.py, but for us the important part is that it changed the number of operations performed, thereby changing the random seed applied in the Xavier initializer on your fully connected layer.

A possible workaround would be specifying the seed for each weight tensor separately, but that would require either manually generating the fully connected layer or touching the Tensorflow code. If you were only interested to know this info to be sure there's no issue with your code, then rest assured.

Minimal example to reproduce behavior, note the commented out line starting with Xf:

import numpy as np
import tensorflow as tf

tf.reset_default_graph()
tf.set_random_seed(1)
with tf.Session() as sess:
    X = tf.constant( [ [ 1, 2, 3, 4, 5, 6 ] ], tf.float32 )
    #Xf = tf.contrib.layers.flatten( X )
    R = tf.random_uniform( shape = () )
    R_V = sess.run( R )
print( R_V )

If you run this code as above, you get a printout of:

0.38538742

for both versions. If you uncomment the Xf line, you get

0.013653636

and

0.6033112

for versions 1.3 and 1.4 respectively. Interesting to note that Xf is never even executed, simply creating it is enough to cause the issue.

Two final notes: the four warnings you get with 1.3 are not related to this, those are only compilation options that could optimize (speed up) some calculations.

The other thing is that this should not affect the training behavior of your code, this issue changes the random seed only. So there must be some other difference causing the slower learning you observe.