What is the relationship between steps and epochs in TensorFlow?

Epoch: One pass through the entire data.

Batch size: The no of examples seen in one batch.

If there are 1000 examples and the batch size is 100, then there will be 10 steps per epoch.

The Epochs and batch size completely define the number of steps.

steps_cal = (no of ex / batch_size) * no_of_epochs

estimator.fit(input_fn=input_fn)

If you just write the above code, then the value of 'steps' is as given by 'steps_cal' in the above formula.

estimator.fit(input_fn=input_fn, steps  = steps_less)

If you give a value(say 'steps_less') less than 'steps_cal', then only 'steps_less' no of steps will be executed.In this case, the training will not cover the entire no of epochs that were mentioned.

estimator.fit(input_fn=input_fn, steps  = steps_more)

If you give a value(say steps_more) more than steps_cal, then also 'steps_cal' no of steps will be executed.


TL;DR: An epoch is when your model goes through your whole training data once. A step is when your model trains on a single batch (or a single sample if you send samples one by one). Training for 5 epochs on a 1000 samples 10 samples per batch will take 500 steps.

The contrib.learn.io module is not documented very well, but it seems that numpy_input_fn() function takes some numpy arrays and batches them together as input for a classificator. So, the number of epochs probably means "how many times to go through the input data I have before stopping". In this case, they feed two arrays of length 4 in 4 element batches, so it will just mean that the input function will do this at most a 1000 times before raising an "out of data" exception. The steps argument in the estimator fit() function is how many times should estimator do the training loop. This particular example is somewhat perverse, so let me make up another one to make things a bit clearer (hopefully).

Lets say you have two numpy arrays (samples and labels) that you want to train on. They are a 100 elements each. You want your training to take batches with 10 samples per batch. So after 10 batches you will go through all of your training data. That is one epoch. If you set your input generator to 10 epochs, it will go through your training set 10 times before stopping, that is it will generate at most a 100 batches.

Again, the io module is not documented, but considering how other input related APIs in tensorflow work, it should be possible to make it generate data for unlimited number of epochs, so the only thing controlling the length of training are going to be the steps. This gives you some extra flexibility on how you want your training to progress. You can go a number of epochs at a time or a number of steps at a time or both or whatever.


Let's start the opposite the order:

1) Steps - number of times the training loop in your learning algorithm will run to update the parameters in the model. In each loop iteration, it will process a chunk of data, which is basically a batch. Usually, this loop is based on the Gradient Descent algorithm.

2) Batch size - the size of the chunk of data you feed in each loop of the learning algorithm. You can feed the whole data set, in which case the batch size is equal to the data set size.You can also feed one example at a time. Or you can feed some number N of examples.

3) Epoch - the number of times you run over the data set extracting batches to feed the learning algorithm.

Say you have 1000 examples. Setting batch size = 100, epoch = 1 and steps = 200 gives a process with one pass (one epoch) over the entire data set. In each pass it will feed the algorithm a batch with 100 examples. The algorithm will run 200 steps in each batch. In total, 10 batches are seen. If you change the epoch to 25, then it will do this 25 times, and you get 25x10 batches seen altogether.

Why do we need this? There are many variations on gradient descent (batch, stochastic, mini-batch) as well as other algorithms for optimizing the learning parameters (e.g., L-BFGS). Some of them need to see the data in batches, while others see one datum at a time. Also, some of them include random factors/steps, hence you might need multiple passes on the data to get good convergence.

Tags:

Tensorflow