Tensorflow: How do you monitor GPU performance during model training in real-time?

  1. Tensorflow automatically doesn't utilize all GPUs, it will use only one GPU, specifically first gpu /gpu:0

    You have to write multi gpus code to utilize all gpus available. cifar mutli-gpu example

  2. to check usage every 0.1 seconds

    watch -n0.1 nvidia-smi


  1. If no other indication is given, a GPU-enabled TensorFlow installation will default to use the first available GPU (as long as you have the Nvidia driver and CUDA 8.0 installed and the GPU has the necessary compute capability, which, according to the docs is 3.0). If you want to use more GPUs, you need to use tf.device directives in your graph (more about it here).
  2. The easiest way to check the GPU usage is the console tool nvidia-smi. However, unlike top or other similar programs, it only shows the current usage and finishes. As suggested in the comments, you can use something like watch -n1 nvidia-smi to re-run the program continuously (in this case every second).