S3¶
Overview¶
The S3 class allows interaction with Amazon Web Service’s
object storage service to store and access data objects.
It is a wrapper around the AWS SDK boto3.
It provides methods to upload and download files from S3 as well as manipulate buckets.
Authentication
Access to S3 is controlled through AWS Identity and Access Management (IAM) users in the AWS Managerment Console. Users can be granted granular access to AWS resources, including S3. IAM users are provisioned keys, which are required to access the S3 class.
Quickstart¶
S3 credentials can be passed as environmental variables
(AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), stored in an AWS CLI file
~/.aws/credentials, or passed as keyword arguments.
from parsons import S3
s3 = S3()
from parsons import S3
s3 = S3(aws_access_key_id='MY_KEY', aws_secret_access_key='MY_SECRET')
with open('winning_formula.csv') as w:
s3.put_file('my_bucket', 'winning.csv, w)
tbl = Table.from_csv('winning_formula.csv')
tbl.to_s3_csv('my_bucket', 'winning.csv')
f = s3.get_file('my_bucket', 'my_dir/my_file.csv')
tbl = Table(f)
buckets = s3.list_buckets()
s3.list_keys('my_bucket')
Temporary Credentials¶
The S3 API supports creating temporary credentials for one-off operations, such as pushing a file to a particular key in a particular bucket.
For example, the Mapbox API allows you to request temporary credentials that grant you access to a bucket where you can upload map data.
When S3 returns a set of temporary credentials it also returns a session token that needs to be included with the standard credentials for
them to be accepted. The S3 class can be passed a session token as an environmental variable (AWS_SESSION_TOKEN) or as a keyword argument.
from parsons import S3
s3 = S3()
from parsons import S3
creds = request_temporary_credentials()
s3 = S3(
aws_access_key_id=creds['id'],
aws_secret_access_key=creds['key'],
aws_session_token=creds['token']
)
API¶
- class parsons.aws.s3.S3(aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, use_env_token=True)[source]¶
Instantiate the S3 class.
- Parameters:
aws_access_key_id – str The AWS access key id. Not required if the
AWS_ACCESS_KEY_IDenv variable is set.aws_secret_access_key – str The AWS secret access key. Not required if the
AWS_SECRET_ACCESS_KEYenv variable is set.aws_session_token – str The AWS session token. Optional. Can also be stored in the
AWS_SESSION_TOKENenv variable. Used for accessing S3 with temporary credentials.use_env_token – boolean Controls use of the
AWS_SESSION_TOKENenvironment variable. Defaults toTrue. Set toFalsein order to ignore theAWS_SESSION_TOKENenvironment variable even if theaws_session_tokenargument was not passed in.
- Returns:
S3 class.
- s3¶
Boto3 API Session Resource object. Use for more advanced boto3 features.
- client¶
Boto3 API Session client object. Use for more advanced boto3 features.
- bucket_exists(bucket)[source]¶
Determine if a bucket exists and you have access to it.
- Parameters:
bucket – str The bucket name
- Returns:
- boolean
Trueif the bucket exists andFalseif not.
- list_keys(bucket, prefix=None, suffix=None, regex=None, date_modified_before=None, date_modified_after=None, **kwargs)[source]¶
List the keys in a bucket, along with extra info about each one.
- Parameters:
bucket – str The bucket name
prefix – str Limits the response to keys that begin with the specified prefix.
suffix – str Limits the response to keys that end with specified suffix
regex – str Limits the reponse to keys that match a regex pattern
date_modified_before – datetime.datetime Limits the response to keys with date modified before
date_modified_after – datetime.datetime Limits the response to keys with date modified after
kwargs – Additional arguments for the S3 API call. See AWS ListObjectsV2 documentation for more info.
- Returns:
- dict
Dict mapping the keys to info about each key. The info includes ‘LastModified’, ‘Size’, and ‘Owner’.
- key_exists(bucket, key)[source]¶
Determine if a key exists in a bucket.
- Parameters:
bucket – str The bucket name
key – str The object key
- Returns:
- boolean
Trueif key exists andFalseif not.
- create_bucket(bucket)[source]¶
Create an s3 bucket.
Warning
S3 has a limit on the number of buckets you can create in an AWS account, and that limit is fairly low (typically 100). If you are creating buckets frequently, you may be mis-using S3, and should consider using the same bucket for multiple tasks. There is no limit on the number of objects in a bucket. See AWS bucket restrictions for more info.
Warning
S3 bucket names are globally unique. So when creating a new bucket, the name can’t collide with any existing bucket names. If the provided name does collide, you’ll see errors like IllegalLocationConstraintException or BucketAlreadyExists.
- Parameters:
bucket – str The name of the bucket to create
- put_file(bucket, key, local_path, acl='bucket-owner-full-control', **kwargs)[source]¶
Uploads an object to an S3 bucket
- Parameters:
bucket – str The bucket name
key – str The object key
local_path – str The local path of the file to upload
acl – str The S3 permissions on the file
kwargs – Additional arguments for the S3 API call. See AWS Put Object documentation for more info.
- remove_file(bucket, key)[source]¶
Deletes an object from an S3 bucket
- Parameters:
bucket – str The bucket name
key – str The object key
- get_file(bucket, key, local_path=None, **kwargs)[source]¶
Download an object from S3 to a local file
- Parameters:
local_path – str The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.
bucket – str The bucket name
key – str The object key
kwargs – Additional arguments for the S3 API call. See AWS download_file documentation for more info.
- Returns:
- str
The path of the new file
- get_url(bucket, key, expires_in=3600)[source]¶
Generates a presigned url for an s3 object.
- Parameters:
bucket – str The bucket name
key – str The object name
expires_in – int The time, in seconds, until the url expires
- Returns:
- str
A link to download the object
- transfer_bucket(origin_bucket, origin_key, destination_bucket, destination_key=None, suffix=None, regex=None, date_modified_before=None, date_modified_after=None, public_read=False, remove_original=False, **kwargs)[source]¶
Transfer files between s3 buckets
- Parameters:
origin_bucket – str The origin bucket
origin_key – str The origin file or prefix
destination_bucket – str The destination bucket
destination_key – str If None then will retain the origin key. If set to prefix will move all to new prefix
suffix – str Limits the response to keys that end with specified suffix
regex – str Limits the reponse to keys that match a regex pattern
date_modified_before – datetime.datetime Limits the response to keys with date modified before
date_modified_after – datetime.datetime Limits the response to keys with date modified after
public_read – bool If the keys should be set to public-read
remove_original – bool If the original keys should be removed after transfer
kwargs – Additional arguments for the S3 API call. See AWS download_file docs for more info.