how to programmatically determine available GPU memory with tensorflow?

If you're using tensorflow-gpu==2.5, you can use

tf.config.experimental.get_memory_info('GPU:0')

to get the actual consumed GPU memory by TF. Nvidia-smi tells you nothing, as TF allocates everything for itself and leaves nvidia-smi no information to track how much of that pre-allocated memory is actually being used.

I actually found an answer in this old question of mine . To bring some additional benefit to readers I tested the mentioned program

import nvidia_smi

nvidia_smi.nvmlInit()

handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate

info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)

print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)

nvidia_smi.nvmlShutdown()

on colab with the following result:

Total memory: 17071734784
Free memory: 17071734784
Used memory: 0

The actual GPU I had there was a Tesla P100 as can be seen from executing

!nvidia-smi

and observing the output

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This code will return free GPU memory in MegaBytes for each GPU:

import subprocess as sp
import os

def get_gpu_memory():
    command = "nvidia-smi --query-gpu=memory.free --format=csv"
    memory_free_info = sp.check_output(command.split()).decode('ascii').split('\n')[:-1][1:]
    memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
    return memory_free_values

get_gpu_memory()

This answer relies on nvidia-smi being installed (which is pretty much always the case for Nvidia GPUs) and therefore is limited to NVidia GPUs.

how to programmatically determine available GPU memory with tensorflow?

Tags:

Python

Gpu

Tensorflow

Related

Recent Posts