What should I do about this gsutil "parallel composite upload" warning?

Another way is to set the configuration that the prompt says inside a file in the BOTO_PATH. usually $HOME/.boto.

[GSUtil]
parallel_composite_upload_threshold = 150M

For max speed install the crcmod C library


The Parallel Composite Uploads section of the documentation for gsutil describes how to resolve this (assuming, as the warning specifies, that this content will be used by clients with the crcmod module available):

gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket

To do this safely from Python would look like:

filename='myfile.csv'
gs_bucket='my/bucket'
parallel_threshold='150M' # minimum size for parallel upload; 0 to disable

subprocess.check_call([
  'gsutil',
  '-o', 'GSUtil:parallel_composite_upload_threshold=%s' % (parallel_threshold,),
  'cp', filename, 'gs://%s/%s' % (gs_bucket, filename)
])

Note that here you're explicitly providing argument vector boundaries, and not relying on a shell to do this for you; this prevents a malicious or buggy filename from performing undesired operations.


If you don't know that the clients accessing content in this bucket will have the crcmod module, consider setting parallel_threshold='0' above, which will disable this support.

Tags:

Python

Gsutil