Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,

I've seen this error message for three different reasons, with different solutions:

1. You have cache issues

I regularly work around this error by shutting down my python process, removing the ~/.nv directory (on linux, rm -rf ~/.nv), and restarting the Python process. I don't exactly know why this works. It's probably at least partly related to the second option:

2. You're out of memory

The error can also show up if you run out of graphics card RAM. With an nvidia GPU you can check graphics card memory usage with nvidia-smi. This will give you a readout of how much GPU RAM you have in use (something like 6025MiB / 6086MiB if you're almost at the limit) as well as a list of what processes are using GPU RAM.

If you've run out of RAM, you'll need to restart the process (which should free up the RAM) and then take a less memory-intensive approach. A few options are:

reducing your batch size
using a simpler model
using less data
limit TensorFlow GPU memory fraction: For example, the following will make sure TensorFlow uses <= 90% of your RAM:

import keras
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9  # 0.6 sometimes works better for folks
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

This can slow down your model evaluation if not used together with the items above, presumably since the large data set will have to be swapped in and out to fit into the small amount of memory you've allocated.

A second option is to have TensorFlow start out using only a minimum amount of memory and then allocate more as needed (documented here):

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc.

If you've never had similar models working, you're not running out of VRAM and your cache is clean, I'd go back and set up CUDA + TensorFlow using the best available installation guide - I have had the most success with following the instructions at https://www.tensorflow.org/install/gpu rather than those on the NVIDIA / CUDA site. Lambda Stack is also a good way to go.

I had the same issue, I solved it thanks to that :

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
   tf.config.experimental.set_memory_growth(physical_devices[0], True)

I had this error and I fixed it by uninstalling all CUDA and cuDNN versions from my system. Then I installed CUDA Toolkit 9.0 (without any patches) and cuDNN v7.4.1 for CUDA 9.0.

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,

1. You have cache issues

2. You're out of memory

3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc.

Tags:

Python

Tensorflow

Keras

Related

Recent Posts