CUDNN ERROR: Failed to get convolution algorithm

I've seen this error message for three different reasons, with different solutions:

1. You have cache issues

I regularly work around this error by shutting down my python process, removing the ~/.nv directory (on linux, rm -rf ~/.nv), and restarting the Python process. I don't exactly know why this works. It's probably at least partly related to the second option:

3. You're out of memory

The error can also show up if you run out of graphics card RAM. With an nvidia GPU you can check graphics card memory usage with nvidia-smi. This will give you not only a readout of how much GPU RAM you have in use (something like 6025MiB / 6086MiB if you're almost at the limit) as well as a list of what processes are using GPU RAM.

If you've run out of RAM, you'll need to restart the process (which should free up the RAM) and then take a less memory-intensive approach. A few options are:

  • reducing your batch size
  • using a simpler model
  • using less data
  • limit TensorFlow GPU memory fraction: For example, the following will make sure TensorFlow uses <= 90% of your RAM:
import keras
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

This will likely slow down your model evaluation if not used together with the items above.

3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers etc.

If you've never had similar models working, you're not running out of VRAM and your cache is clean, I'd go back and set up CUDA + TensorFlow using the best available installation guide - I have had the most success with following the instructions at https://www.tensorflow.org/install/gpu rather than those on the NVIDIA / CUDA site.