Write-streaming to Google Cloud Storage in Python

smart_open now has support for GCS and also has support for on the fly decompression.

import lzma
from smart_open import open, register_compressor

def _handle_xz(file_obj, mode):
    return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)

register_compressor('.xz', _handle_xz)

# stream from GCS
with open('gs://my_bucket/my_file.txt.xz') as fin:
    for line in fin:
        print(line)

# stream content *into* GCS (write mode):
with open('gs://my_bucket/my_file.txt.xz', 'wb') as fout:
    fout.write(b'hello world')

I got confused with multipart vs. resumable upload. The latter is what you need for "streaming" - it's actually more like uploading chunks of a buffered stream.

Multipart upload is to load data and custom metadata at once, in the same API call.

While I like GCSFS very much - Martin, his main contributor is very responsive -, I recently found an alternative that uses the google-resumable-media library.

GCSFS is built upon the core http API whereas Seth's solution uses a low-level library maintained by Google, more in sync with API changes and which includes exponential backup. The latter is really a must for large/long stream as connection may drop, even within GCP - we faced the issue with GCF.

On a closing note, I still believe that the Google Cloud Library is the right place to add stream-like functionality, with basic write and read. It has the core code already.

If you too are interested in that feature in the core lib, thumbs up the issue here - assuming priority is based thereon.