What is the fastest way to save a large pandas DataFrame to S3?
Use multi-part uploads to make the transfer to S3 faster. Compression makes the file smaller, so that will help too.
import boto3 s3 = boto3.client('s3') csv_buffer = BytesIO() df.to_csv(csv_buffer, compression='gzip') # multipart upload # use boto3.s3.transfer.TransferConfig if you need to tune part size or other settings s3.upload_fileobj(csv_buffer, bucket, key)
The docs for
s3.upload_fileobj are here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_fileobj
You can try using
pandas compression to upload to S3.
BytesIO are memory hogging.
import s3fs import pandas as pd s3 = s3fs.S3FileSystem(anon=False) df = pd.read_csv("some_large_file") with s3.open('s3://bucket/file.csv.gzip','w') as f: df.to_csv(f, compression='gzip')