S3

Overview

The S3 class allows interaction with Amazon Web Service’s object storage service to store and access data objects. It is a wrapper around the AWS SDK boto3. It provides methods to upload and download files from S3 as well as manipulate buckets.

Authentication

Access to S3 is controlled through AWS Identity and Access Management (IAM) users in the AWS Managerment Console. Users can be granted granular access to AWS resources, including S3. IAM users are provisioned keys, which are required to access the S3 class.

Quickstart

S3 credentials can be passed as environmental variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), stored in an AWS CLI file ~/.aws/credentials, or passed as keyword arguments.

Pass API credentials via environmental variables or an AWS CLI file
from parsons import S3
s3 = S3()
Pass API credentials as arguments
from parsons import S3
s3 = S3(aws_access_key_id='MY_KEY', aws_secret_access_key='MY_SECRET')
Put an arbitrary file in an S3 bucket
with open('winning_formula.csv') as w:
    s3.put_file('my_bucket', 'winning.csv, w)
Put a Parsons Table as a CSV using convenience method
tbl = Table.from_csv('winning_formula.csv')
tbl.to_s3_csv('my_bucket', 'winning.csv')
Download a csv file and convert to a table
f = s3.get_file('my_bucket', 'my_dir/my_file.csv')
tbl = Table(f)
List buckets that you have access to
buckets = s3.list_buckets()
List the keys in a bucket
s3.list_keys('my_bucket')

Temporary Credentials

The S3 API supports creating temporary credentials for one-off operations, such as pushing a file to a particular key in a particular bucket. For example, the Mapbox API allows you to request temporary credentials that grant you access to a bucket where you can upload map data. When S3 returns a set of temporary credentials it also returns a session token that needs to be included with the standard credentials for them to be accepted. The S3 class can be passed a session token as an environmental variable (AWS_SESSION_TOKEN) or as a keyword argument.

Pass session token via AWS_SESSION_TOKEN environmental variable
from parsons import S3
s3 = S3()
Pass session token as an argument
from parsons import S3
creds = request_temporary_credentials()
s3 = S3(
   aws_access_key_id=creds['id'],
   aws_secret_access_key=creds['key'],
   aws_session_token=creds['token']
)

API

class parsons.aws.s3.S3(aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, use_env_token=True)[source]

Instantiate the S3 class.

Parameters:
  • aws_access_key_id – str The AWS access key id. Not required if the AWS_ACCESS_KEY_ID env variable is set.

  • aws_secret_access_key – str The AWS secret access key. Not required if the AWS_SECRET_ACCESS_KEY env variable is set.

  • aws_session_token – str The AWS session token. Optional. Can also be stored in the AWS_SESSION_TOKEN env variable. Used for accessing S3 with temporary credentials.

  • use_env_token – boolean Controls use of the AWS_SESSION_TOKEN environment variable. Defaults to True. Set to False in order to ignore the AWS_SESSION_TOKEN environment variable even if the aws_session_token argument was not passed in.

Returns:

S3 class.

s3

Boto3 API Session Resource object. Use for more advanced boto3 features.

client

Boto3 API Session client object. Use for more advanced boto3 features.

list_buckets()[source]

List all buckets to which you have access.

Returns:

list

bucket_exists(bucket)[source]

Determine if a bucket exists and you have access to it.

Parameters:

bucket – str The bucket name

Returns:

boolean

True if the bucket exists and False if not.

list_keys(bucket, prefix=None, suffix=None, regex=None, date_modified_before=None, date_modified_after=None, **kwargs)[source]

List the keys in a bucket, along with extra info about each one.

Parameters:
  • bucket – str The bucket name

  • prefix – str Limits the response to keys that begin with the specified prefix.

  • suffix – str Limits the response to keys that end with specified suffix

  • regex – str Limits the reponse to keys that match a regex pattern

  • date_modified_before – datetime.datetime Limits the response to keys with date modified before

  • date_modified_after – datetime.datetime Limits the response to keys with date modified after

  • kwargs – Additional arguments for the S3 API call. See AWS ListObjectsV2 documentation for more info.

Returns:

dict

Dict mapping the keys to info about each key. The info includes ‘LastModified’, ‘Size’, and ‘Owner’.

key_exists(bucket, key)[source]

Determine if a key exists in a bucket.

Parameters:
  • bucket – str The bucket name

  • key – str The object key

Returns:

boolean

True if key exists and False if not.

create_bucket(bucket)[source]

Create an s3 bucket.

Warning

S3 has a limit on the number of buckets you can create in an AWS account, and that limit is fairly low (typically 100). If you are creating buckets frequently, you may be mis-using S3, and should consider using the same bucket for multiple tasks. There is no limit on the number of objects in a bucket. See AWS bucket restrictions for more info.

Warning

S3 bucket names are globally unique. So when creating a new bucket, the name can’t collide with any existing bucket names. If the provided name does collide, you’ll see errors like IllegalLocationConstraintException or BucketAlreadyExists.

Parameters:

bucket – str The name of the bucket to create

put_file(bucket, key, local_path, acl='bucket-owner-full-control', **kwargs)[source]

Uploads an object to an S3 bucket

Parameters:
  • bucket – str The bucket name

  • key – str The object key

  • local_path – str The local path of the file to upload

  • acl – str The S3 permissions on the file

  • kwargs – Additional arguments for the S3 API call. See AWS Put Object documentation for more info.

remove_file(bucket, key)[source]

Deletes an object from an S3 bucket

Parameters:
  • bucket – str The bucket name

  • key – str The object key

get_file(bucket, key, local_path=None, **kwargs)[source]

Download an object from S3 to a local file

Parameters:
  • local_path – str The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.

  • bucket – str The bucket name

  • key – str The object key

  • kwargs – Additional arguments for the S3 API call. See AWS download_file documentation for more info.

Returns:

str

The path of the new file

get_url(bucket, key, expires_in=3600)[source]

Generates a presigned url for an s3 object.

Parameters:
  • bucket – str The bucket name

  • key – str The object name

  • expires_in – int The time, in seconds, until the url expires

Returns:

str

A link to download the object

transfer_bucket(origin_bucket, origin_key, destination_bucket, destination_key=None, suffix=None, regex=None, date_modified_before=None, date_modified_after=None, public_read=False, remove_original=False, **kwargs)[source]

Transfer files between s3 buckets

Parameters:
  • origin_bucket – str The origin bucket

  • origin_key – str The origin file or prefix

  • destination_bucket – str The destination bucket

  • destination_key – str If None then will retain the origin key. If set to prefix will move all to new prefix

  • suffix – str Limits the response to keys that end with specified suffix

  • regex – str Limits the reponse to keys that match a regex pattern

  • date_modified_before – datetime.datetime Limits the response to keys with date modified before

  • date_modified_after – datetime.datetime Limits the response to keys with date modified after

  • public_read – bool If the keys should be set to public-read

  • remove_original – bool If the original keys should be removed after transfer

  • kwargs – Additional arguments for the S3 API call. See AWS download_file docs for more info.

get_buckets_with_subname(bucket_subname)[source]

Grabs a type of bucket based on naming convention.

Parameters:

subname – str This will most commonly be a ‘vendor’

Returns:

list

list of buckets