Load xlsx file from drive in colaboratory

First, I import io, pandas and files from google.colab

import io
import pandas as pd
from google.colab import files

Then I upload the file using an upload widget

uploaded = files.upload()

You will something similar to this (click on Choose Files and upload the xlsx file): enter image description here

Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:

df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))

And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:

First, move the io call to another variable

xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))

And then, use the new variable to specify the sheet name, like this:

df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')

You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.

Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC

What I did in more detail:

I created a new spreadsheet in sheets to be exported as an .xlsx file.

Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM

Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.

Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:

file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')

Finally, to create a Pandas DataFrame:

!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df

The !pip install... line installs the xlrd library, which is needed to read Excel files.


Perhaps a simpler method:

#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAå
from google.colab import drive
drive.mount('/content/drive')

df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')

# #When done, 
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')