What is the fastest way to save a large pandas DataFrame to S3?

Use multi-part uploads to make the transfer to S3 faster. Compression makes the file smaller, so that will help too.

import boto3
s3 = boto3.client('s3')

csv_buffer = BytesIO()
df.to_csv(csv_buffer, compression='gzip')

# multipart upload
# use boto3.s3.transfer.TransferConfig if you need to tune part size or other settings
s3.upload_fileobj(csv_buffer, bucket, key)

The docs for s3.upload_fileobj are here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_fileobj


You can try using s3fs with pandas compression to upload to S3. StringIO or BytesIO are memory hogging.

import s3fs
import pandas as pd

s3 = s3fs.S3FileSystem(anon=False)
df = pd.read_csv("some_large_file")
with s3.open('s3://bucket/file.csv.gzip','w') as f:
    df.to_csv(f, compression='gzip')