Google Colab Storage

Presently, the amount of local storage in colab depends on the chosen hardware accelerator runtime type:

# Hardware accelerator none
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

# Hardware accelerator GPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay         359G   23G  318G   7% /

# Hardware accelerator TPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

Even if you don't need a GPU, swithcing to that runtime type will provide you with an extra 310Gb of storage space.


Yes, the Colab notebook local storage is about 40 GiB right now. One way to see the exact value (in Python 3):

import subprocess
p = subprocess.Popen('df -h', shell=True, stdout=subprocess.PIPE)
print(str(p.communicate()[0], 'utf-8'))

However: for large amounts of data, local storage is a non-optimal way to feed the TPU, which is not connected directly to the machine running the notebook. Instead, consider storing your large dataset in GCP storage, and sourcing that data from the Colab notebook. (Moreover, the amount of Colab local storage may change, and the Colab notebook itself will expire after a few hours, taking local storage with it.)

Take a look at the canonical TPU Colab notebook. At the bottom are some next steps, which include a link to Searching Shakespeare with TPUs. In that notebook is the following code fragment, which demonstrates GCP authentication to your Colab TPU. It looks like this:

from google.colab import auth
auth.authenticate_user()

if 'COLAB_TPU_ADDR' in os.environ:
  TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])

  # Upload credentials to TPU.
  with tf.Session(TF_MASTER) as sess:    
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.
else:
  TF_MASTER=''