Use pathlib for S3 paths

Using s3path package

The s3path package makes working with S3 paths a little less painful. It is installable from PyPI or conda-forge. Use the S3Path class for actual objects in S3 and otherwise use PureS3Path which shouldn't actually access S3.

Although the previous answer by metaperture did mention this package, it didn't include the URI syntax.

Also be aware that this package has certain deficiencies which are reported in its issues.

>>> from s3path import PureS3Path

>>> PureS3Path.from_uri('s3://mybucket/foo/bar') / 'add/me'
PureS3Path('/mybucket/foo/bar/add/me')

>>> _.as_uri()
's3://mybucket/foo/bar/add/me'

Note the instance relationships to pathlib:

>>> from pathlib import Path, PurePath
>>> from s3path import S3Path, PureS3Path

>>> isinstance(S3Path('/my-bucket/some/prefix'), Path)
True
>>> isinstance(PureS3Path('/my-bucket/some/prefix'), PurePath)
True

Using pathlib.Path

This is a lazier version of the answer by kichik using only pathlib. This approach is not necessarily recommended. It's just not always entirely necessary to use urllib.parse.

>>> from pathlib import Path

>>> orig_s3_path = 's3://mybucket/foo/bar'
>>> orig_path = Path(*Path(orig_s3_path).parts[1:])
>>> orig_path
PosixPath('mybucket/foo/bar')

>>> new_path = orig_path / 'add/me'
>>> new_s3_path = 's3://' + str(new_path)
>>> new_s3_path
's3://mybucket/foo/bar/add/me'

Using os.path.join

For simple joins only, how about os.path.join?

>>> import os

>>> os.path.join('s3://mybucket/foo/bar', 'add/me')
's3://mybucket/foo/bar/add/me'
>>> os.path.join('s3://mybucket/foo/bar/', 'add/me')
's3://mybucket/foo/bar/add/me'

os.path.normpath cannot however be naively used:

>>> os.path.normpath('s3://mybucket/foo/bar')  # Converts 's3://' to 's3:/'
's3:/mybucket/foo/bar'

You can try combining urllib.parse with pathlib.

from urllib.parse import urlparse, urlunparse
from pathlib import PosixPath

s3_url = urlparse('s3://bucket/key')
s3_path = PosixPath(s3_url.path)
s3_path /= 'hello'
s3_new_url = urlunparse((s3_url.scheme, s3_url.netloc, s3_path.as_posix(), s3_url.params, s3_url.query, s3_url.fragment))
print(s3_new_url)

It's quite cumbersome, but it's what you asked for.

Tags:

Python

Pathlib