Cloud Storage¶
Overview¶
Google Cloud Storage is a cloud file storage system. It uses buckets in which to store arbitrary files referred to as blobs. You may use this connector to upload Parsons tables as blobs, download them to files, and list available blobs.
To use the GoogleCloudStorage class, you will need Google service account credentials.
If you are the administrator of your Google Cloud account, you can generate them at
Service accounts - IAM & Admin
Once signed in, select your project, then your project’s email, then Keys,
then Add key, and finally Create new key.
Quickstart¶
To instantiate the GoogleBigQuery class, you can pass the constructor a string containing
either the name of your Google service account credentials file or a JSON string
encoding those credentials. Alternatively, you can set the environment variable
GOOGLE_APPLICATION_CREDENTIALS to be either of those strings and
call the constructor without that argument.
from parsons import GoogleCloudStorage
# May be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google_credentials_file.json'
gcs = GoogleCloudStorage()
credentials_filename = 'google_credentials_file.json'
project = 'parsons-test' # Project in which we're working
gcs = GoogleCloudStorage(app_creds=credentials_filename, project=project)
gcs.create_bucket('parsons_bucket')
gcs.list_buckets()
gcs.upload_table(bucket='parsons_bucket', table=parsons_table, blob_name='parsons_blob')
gcs.get_blob(bucket_name='parsons_bucket', blob_name='parsons_blob')
API¶
- class parsons.google.google_cloud_storage.GoogleCloudStorage(app_creds: str | dict | Credentials | None = None, project=None)[source]¶
Google Cloud Storage connector utility
This class requires application credentials in the form of a json or google oauth2 Credentials object. It can be passed in the following ways:
Set an environmental variable named
GOOGLE_APPLICATION_CREDENTIALSwith the local path to the credentials json.Example:
GOOGLE_APPLICATION_CREDENTALS='path/to/creds.json'Pass in the path to the credentials using the
app_credsargument.Pass in a json string using the
app_credsargument.Generate the google credentials object directly, pass in using the
app_credsargument.
For example, to pass in credentials from a parent shell that is authenticated with gcloud auth:
from google.auth import default app_creds, _ = default() gcs = GoogleCloudStorage(app_creds=app_creds)
- Parameters:
app_creds (str | dict | Credentials | None) – str, dict, or google.oauth2.credentials.Credentials object A credentials json string or a path to a json file. Not required if
GOOGLE_APPLICATION_CREDENTIALSenv variable set. Can also pass a google oauth2 Credentials object directly.project – str The project which the client is acting on behalf of. If not passed then will use the default inferred environment.
- Returns:
GoogleCloudStorage Class
- client¶
Access all methods of google.cloud package
- bucket_exists(bucket_name)[source]¶
Verify that a bucket exists
- Parameters:
bucket_name – str The name of the bucket
- Returns:
boolean
- get_bucket(bucket_name)[source]¶
Returns a bucket object
- Parameters:
bucket_name – str The name of bucket
- Returns:
GoogleCloud Storage bucket
- create_bucket(bucket_name)[source]¶
Create a bucket.
- Parameters:
bucket_name – str A globally unique name for the bucket.
- delete_bucket(bucket_name, delete_blobs=False)[source]¶
Delete a bucket. Will fail if not empty unless
delete_blobsargument is set toTrue.- Parameters:
bucket_name – str The name of the bucket
delete_blobs – boolean Delete blobs in the bucket, if it is not empty
- list_blobs(bucket_name, max_results=None, prefix=None, match_glob=None, include_file_details=False)[source]¶
List all of the blobs in a bucket
- Parameters:
bucket_name – str The name of the bucket
max_results – int Maximum number of blobs to return
prefix – str A prefix to filter files
match_glob – str Filters files based on glob string. NOTE that the match_glob parameter runs on the full blob URI, include a preceding wildcard value to account for nested files (/ for one level, */ for n levels)
include_file_details – bool If True, returns a list of Blob objects with accessible metadata. For documentation of attributes associated with Blob objects see https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob
- Returns:
A list of blob names (or Blob objects if include_file_details is invoked)
- blob_exists(bucket_name, blob_name)[source]¶
Verify that a blob exists in the specified bucket
- Parameters:
bucket_name – str The bucket name
blob_name – str The name of the blob
- Returns:
boolean
- get_blob(bucket_name, blob_name)[source]¶
Get a blob object
- Parameters:
bucket_name – str A bucket name
blob_name – str A blob name
- Returns:
A Google Storage blob object
- put_blob(bucket_name, blob_name, local_path, **kwargs)[source]¶
Puts a blob (aka file) in a bucket
- Parameters:
bucket_name – The name of the bucket to store the blob
blob_name – The name of blob to be stored in the bucket
local_path – str The local path of the file to upload
- download_blob(bucket_name, blob_name, local_path=None)[source]¶
Gets a blob from a bucket
- Parameters:
bucket_name – str The name of the bucket
blob_name – str The name of the blob
local_path – str The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.
- Returns:
- str
The path of the downloaded file
- delete_blob(bucket_name, blob_name)[source]¶
Delete a blob
- Parameters:
bucket_name – str The bucket name
blob_name – str The blob name
- upload_table(table, bucket_name, blob_name, data_type: Literal['csv', 'json'] = 'csv', default_acl=None, timeout: int = 60)[source]¶
Load the data from a Parsons table into a blob.
- Parameters:
table – obj A Table
bucket_name – str The name of the bucket to upload the data into.
blob_name – str The name of the blob to upload the data into.
data_type (Literal['csv', 'json']) – str The file format to use when writing the data. One of: csv or json
default_acl – ACL desired for newly uploaded table
timeout (int)
- Returns:
String representation of file URI in GCS
- get_url(bucket_name, blob_name, expires_in=60)[source]¶
Generates a presigned url for a blob.
- Parameters:
bucket_name – str The name of the bucket
blob_name – str The name of the blob
expires_in – int Minutes until the url expires
- Returns:
- str
A link to download the object
- copy_bucket_to_gcs(gcs_sink_bucket: str, source: str, source_bucket: str, destination_path: str = '', source_path: str = '', aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, delete_objects_unique_in_sink: bool = False)[source]¶
Creates a one-time transfer job from Amazon S3 to Google Cloud Storage. Copies all blobs within the bucket unless a key or prefix is passed.
- Parameters:
gcs_sink_bucket (str) – Destination for the data transfer (located in GCS)
source (str) – File storge vendor [gcs or s3]
source_bucket (str) – Source bucket name
source_path (str) – Path in the source system pointing to the relevant keys / files to sync. Must end in a ‘/’
aws_access_key_id (str) – Access key to authenticate storage transfer
aws_secret_access_key (str) – Secret key to authenticate storage transfer
delete_objects_unique_in_sink (bool) – Whether objects should be deleted from the source after they are transferred to the sink. Default is false.
destination_path (str)
- split_uri(gcs_uri: str)[source]¶
Split a GCS URI into a bucket and blob name
- Parameters:
gcs_uri (str) – str GCS URI
- Returns:
Tuple of strings with bucket_name and blob_name
- unzip_blob(bucket_name: str, blob_name: str, compression_type: Literal['zip', 'gzip'] = 'gzip', new_filename: str | None = None, new_file_extension: str | None = None) str[source]¶
Downloads and decompresses a blob. The decompressed blob is re-uploaded with the same filename if no new_filename parameter is provided.
- Parameters:
bucket_name (str) – str GCS bucket name
blob_name (str) – str Blob name in GCS bucket
compression_type (Literal['zip', 'gzip']) – str Either zip or gzip
new_filename (str | None) – str If provided, replaces the existing blob name when the decompressed file is uploaded
new_file_extension (str | None) – str If provided, replaces the file extension when the decompressed file is uploaded
- Returns:
String representation of decompressed GCS URI
- Return type: