How to find size of a folder inside an S3 bucket?

In order to get the size of an S3 folder, objects (accessible in the boto3.resource('s3').Bucket) provide the method filter(Prefix) that allows you to retrieve ONLY the files which respect the Prefix condition, and makes it quite optimised.

import boto3

def get_size(bucket, path):
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(bucket)
    total_size = 0

    for obj in my_bucket.objects.filter(Prefix=path):
        total_size = total_size + obj.size

    return total_size

So let's say you want to get the size of the folder s3://my-bucket/my/path/ then you would call the previous function like that:

get_size("my-bucket", "my/path/")

Then this of course is easily applicable to top level folders as well


To find the size of the top-level "folders" in S3 (S3 does not really have a concept of folders, but kind of displays a folder structure in the UI), something like this will work:

from boto3 import client
conn = client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

To get more than 1000 objects from S3 by using list_objects_v2, try this

from boto3 import client
conn = client('s3')

top_level_folders = dict()

paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
    for key in page['Contents']:
        print(key['Size'])
        folder = key['Key'].split('/')[index]
        print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

        if folder in top_level_folders:
            top_level_folders[folder] += key['Size']
        else:
            top_level_folders[folder] = key['Size']

for folder, size in top_level_folders.items():
    size_in_gb = size/(1024*1024*1024)
    print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))

if the prefix is notes/ and the delimiter is a slash (/) as in notes/summer/july, the common prefix is notes/summer/. Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2