GradienTape convergence much slower than

Dataset.shuffle() only shuffle each minibatch, so each epoch has the same order. Keras .fit() uses some magics to shuffle the whole dataset before each epoch. To do this in TF, you need to use Dataset .repeat(epochs_number) and .shuffle(..., reshuffle_each_iteration=True):

train_ds = data.Dataset.from_tensor_slices(
    (np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1)]), index_data)
    ).shuffle(100000, reshuffle_each_iteration=True
    ).batch(batch_size, drop_remainder=True

for ix, (examples, labels) in train_ds.enumerate():
    train_step(examples, labels)
    current_epoch = ix // (len(index_data) // batch_size)

This workaround is not beautiful nor natural, for the moment you can use this to shuffle each epoch. It's a known issue and will be fixed, in the future you can use for epoch in range(epochs_number) instead of .repeat().