Google

Google Cloud services allow you to upload and manipulate Tables as spreadsheets (via GoogleSheets) or query them as SQL database tables (via GoogleBigQuery). You can also upload/store/download them as binary objects (via GoogleCloudStorage). Google also offers an API for civic information using GoogleCivic and admin information using the Google Workspace Admin SDK.

For all of these services you will need to enable the APIs for your Google Cloud account and obtain authentication tokens or other credentials to access them from your scripts. If you are the administrator of your Google Cloud account, you can do both of these at Google Cloud Console APIs and Services. The connectors below have more specific information about how to authenticate.

Google Admin

Overview

The GoogleAdmin class allows you to get information about groups and members in Google Admin.

In order to instantiate the class, you must pass Google service account credentials as a dictionary, or store the credentials as a JSON string in the GOOGLE_APPLICATION_CREDENTIALS environment variable. You must also provide an email address for domain-wide delegation.

Quickstart

To instantiate the GoogleAdmin class, you can either pass the constructor a dict containing your Google service account credentials or define the environment variable GOOGLE_APPLICATION_CREDENTIALS to contain a JSON encoding of the dict.

from parsons import GoogleAdmin

# First approach: Use API credentials via environmental variables
admin = GoogleAdmin(None, 'fakeemail@fakedomain.com')

# Second approach: Pass API credentials as argument
credential_filename = 'google_application_credentials.json'
credentials = json.load(open(credential_filename))
sheets = GoogleSheets(credentials, 'fakeemail@fakedomain.com')

You can then get information about groups and members using instance methods:

members = admin.get_all_group_members('group_key')
groups = admin.get_all_groups(domain='fakedomain.com')

API

class parsons.google.google_admin.GoogleAdmin(app_creds=None, sub=None)[source]

A connector for Google Admin.

Args:

app_creds: str: A credentials json string or a path to a json file. Not required if GOOGLE_APPLICATION_CREDENTIALS env variable set.
sub: str: An email address that this service account will act on behalf of (via domain-wide delegation)

Returns:

GoogleAdmin Class

get_aliases(group_key, params=None)[source]

Get aliases for a group. Google Admin API Documentation

Args:

group_key: str: The Google group id
params: dict: A dictionary of fields for the GET request

Returns:

Table Class

get_all_group_members(group_key, params=None)[source]

Get all members in a group. Google Admin API Documentation

Args:

group_key: str: The Google group id
params: dict: A dictionary of fields for the GET request

Returns:

Table Class

get_all_groups(params=None)[source]

Get all groups in a domain or account. Google Admin API Documentation Args:

params: dict
A dictionary of fields for the GET request.

Returns:: Table Class

BigQuery

Overview

Google BigQuery is a cloud data warehouse solution. Data is stored in tables, and users can query using SQL. BigQuery uses datasets as top level containers for tables, and datasets are themselves contained within Google Cloud projects.

Quickstart

To instantiate the GoogleBigQuery class, you can pass the constructor a string containing either the name of the Google service account credentials file or a JSON string encoding those credentials. Alternatively, you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS to be either of those strings and call the constructor without that argument.

from parsons import GoogleBigQuery

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google_credentials_file.json'

bigquery = GoogleBigQuery()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

# Project in which we're working
project = 'parsons-test'
bigquery = GoogleBigQuery(
   app_creds='google_credentials_file.json',
   project=project
)

We can now upload/query data.

dataset = 'parsons_dataset'
table = 'parsons_table'

# Table name should be project.dataset.table, or dataset.table, if
# working with the default project
table_name = f"`{project}.{dataset}.{table}`"

# Must be pre-existing bucket. Create via GoogleCloudStorage() or
# at https://console.cloud.google.com/storage/create-bucket. May be
# omitted if the name of the bucket is specified in environment
# variable GCS_TEMP_BUCKET.
tmp_gcs_bucket = 'parsons_bucket'

# Create dataset if it doesn't already exist
bigquery.client.create_dataset(dataset=dataset, exists_ok=True)

parsons_table = Table([{'name':'Bob', 'party':'D'},
                       {'name':'Jane', 'party':'D'},
                       {'name':'Sue', 'party':'R'},
                       {'name':'Bill', 'party':'I'}])

# Copy table in to create new BigQuery table
bigquery.copy(
   table_obj=parsons_table,
   table_name=table_name,
   tmp_gcs_bucket=tmp_gcs_bucket
)

# Select from project.dataset.table
bigquery.query(f'select name from {table_name} where party = "D"')

# Query with parameters
bigquery.query(
   f"select name from {table_name} where party = %s",
   parameters=["D"]
)

# Delete the table when we're done
bigquery.client.delete_table(table=table_name)

API

class parsons.google.google_bigquery.GoogleBigQuery(app_creds: str | dict | Credentials | None = None, project=None, location=None, client_options: dict = None, tmp_gcs_bucket: str | None = None)[source]

Class for querying BigQuery table and returning the data as Parsons tables.

This class requires application credentials in the form of a json. It can be passed in the following ways:

Set an environmental variable named GOOGLE_APPLICATION_CREDENTIALS with the local path to the credentials json.

Example: GOOGLE_APPLICATION_CREDENTALS='path/to/creds.json'
Pass in the path to the credentials using the app_creds argument.
Pass in a json string using the app_creds argument.

Args:

app_creds: str: A credentials json string or a path to a json file. Not required if GOOGLE_APPLICATION_CREDENTIALS env variable set.
project: str: The project which the client is acting on behalf of. If not passed then will use the default inferred environment.
location: str: Default geographic location for tables
client_options: dict: A dictionary containing any requested client options. Defaults to the required scopes for making API calls against External tables stored in Google Drive. Can be set to None if these permissions are not desired
gcs_temp_bucket: str: Name of the GCS bucket that will be used for storing data during bulk transfers. Required if you intend to perform bulk data transfers (eg. the copy_from_gcs method), and env variable GCS_TEMP_BUCKET is not populated.

property client

Get the Google BigQuery client to use for making queries.

Returns:: google.cloud.bigquery.client.Client

connection()[source]

Generate a BigQuery connection. The connection is set up as a python “context manager”, so it will be closed automatically when the connection goes out of scope. Note that the BigQuery API uses jobs to run database operations and, as such, simply has a no-op for a “commit” function.

If you would like to manage transactions, please use multi-statement queries as [outlined here](https://cloud.google.com/bigquery/docs/transactions) or utilize the query_with_transaction method on this class.

When using the connection, make sure to put it in a with block (necessary for any context manager): with bq.connection() as conn:

Returns:: Google BigQuery connection object

query(sql: str, parameters: list | dict | None = None, return_values: bool = True) → Table | None[source]

Run a BigQuery query and return the results as a Parsons table.

To include python variables in your query, it is recommended to pass them as parameters, following the BigQuery style where parameters are prefixed with @`s. Using the ``parameters` argument ensures that values are escaped properly, and avoids SQL injection attacks.

Parameter Examples

name = "Beatrice O'Brady"
sql = 'SELECT * FROM my_table WHERE name = %s'
rs.query(sql, parameters=[name])

name = "Beatrice O'Brady"
sql = "SELECT * FROM my_table WHERE name = %(name)s"
rs.query(sql, parameters={'name': name})

Args:

sql: str: A valid BigTable statement
parameters: dict: A dictionary of query parameters for BigQuery.

Returns:

Parsons Table: See Parsons Table for output options.

query_with_connection(sql, connection, parameters=None, commit=True, return_values: bool = True)[source]

Execute a query against the BigQuery database, with an existing connection. Useful for batching queries together. Will return None if the query returns zero rows.

Args:

sql: str: A valid SQL statement
connection: obj: A connection object obtained from redshift.connection()
parameters: list: A list of python variables to be converted into SQL values in your query
commit: boolean: Must be true. BigQuery

Returns:

Parsons Table: See Parsons Table for output options.

get_job(job_id: str, **job_kwargs) → LoadJob | CopyJob | ExtractJob | QueryJob | UnknownJob[source]

Fetch a job

Args:

job_id: str: ID of job to fetch
location: str: Location where the job was run
**job_kwargs: kwargs: Other arguments to pass to the underlying get_job call on the BigQuery client.

copy_from_gcs(gcs_blob_uri: str, table_name: str, if_exists: str = 'fail', max_errors: int = 0, data_type: str = 'csv', csv_delimiter: str = ',', ignoreheader: int = 1, nullas: str | None = None, allow_quoted_newlines: bool = True, allow_jagged_rows: bool = True, quote: str | None = None, schema: List[dict] | None = None, job_config: LoadJobConfig | None = None, force_unzip_blobs: bool = False, compression_type: str = 'gzip', new_file_extension: str = 'csv', template_table: str | None = None, max_timeout: int = 21600, **load_kwargs)[source]

Copy a csv saved in Google Cloud Storage into Google BigQuery.

Args:

gcs_blob_uri: str: The GoogleCloudStorage URI referencing the file to be copied.
table_name: str: The table name to load the data into. Will be used to generate load schema if no custom schema or template table are supplied and the if_exists is set to “truncate” or “append”.
if_exists: str: If the table already exists, either fail, append, drop or truncate the table. This maps to write_disposition in the LoadJobConfig class.
max_errors: int: The maximum number of rows that can error and be skipped before the job fails. This maps to max_bad_records in the LoadJobConfig class.
data_type: str: Denotes whether target file is a JSON or CSV
csv_delimiter: str: Character used to separate values in the target file
ignoreheader: int: Treats the specified number_rows as a file header and doesn’t load them
nullas: str: Loads fields that match null_string as NULL, where null_string can be any string
allow_quoted_newlines: bool: If True, detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary
allow_jagged_rows: bool: Allow missing trailing optional columns (CSV only).
quote: str: The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state.
schema: list: BigQuery expects a list of dictionaries in the following format ``` schema = [

{“name”: “column_name”, “type”: STRING}, {“name”: “another_column_name”, “type”: INT}
job_config: object: A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided. Note if there are any conflicts between the job_config and other parameters, the job_config values are preferred.
force_unzip_blobs: bool: If True, target blobs will be unzipped before being loaded to BigQuery.
compression_type: str: Accepts zip or gzip values to differentially unzip a compressed blob in cloud storage.
new_file_extension: str: Provides a file extension if a blob is decompressed and rewritten to cloud storage.
template_table: str: Table name to be used as the load schema. Load operation wil use the same columns and data types as the template table.
max_timeout: int: The maximum number of seconds to wait for a request before the job fails.
**load_kwargs: kwargs: Other arguments to pass to the underlying load_table_from_uri call on the BigQuery client.

copy_large_compressed_file_from_gcs(gcs_blob_uri: str, table_name: str, if_exists: str = 'fail', max_errors: int = 0, data_type: str = 'csv', csv_delimiter: str = ',', ignoreheader: int = 1, nullas: str | None = None, allow_quoted_newlines: bool = True, allow_jagged_rows: bool = True, quote: str | None = None, schema: List[dict] | None = None, job_config: LoadJobConfig | None = None, compression_type: str = 'gzip', new_file_extension: str = 'csv', template_table: str | None = None, max_timeout: int = 21600, **load_kwargs)[source]

Copy a compressed CSV file that exceeds the maximum size in Google Cloud Storage into Google BigQuery.

Args:

gcs_blob_uri: str: The GoogleCloudStorage URI referencing the file to be copied.
table_name: str: The table name to load the data into. Will be used to generate load schema if no custom schema or template table are supplied and the if_exists is set to “truncate” or “append”.
if_exists: str: If the table already exists, either fail, append, drop or truncate the table. This maps to write_disposition in the LoadJobConfig class.
max_errors: int: The maximum number of rows that can error and be skipped before the job fails. This maps to max_bad_records in the LoadJobConfig class.
data_type: str: Denotes whether target file is a JSON or CSV
csv_delimiter: str: Character used to separate values in the target file
ignoreheader: int: Treats the specified number_rows as a file header and doesn’t load them
nullas: str: Loads fields that match null_string as NULL, where null_string can be any string
allow_quoted_newlines: bool: If True, detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary
allow_jagged_rows: bool: Allow missing trailing optional columns (CSV only).
quote: str: The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state.
schema: list: BigQuery expects a list of dictionaries in the following format ``` schema = [

{“name”: “column_name”, “type”: STRING}, {“name”: “another_column_name”, “type”: INT}
job_config: object: A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided. Note if there are any conflicts between the job_config and other parameters, the job_config values are preferred.
compression_type: str: Accepts zip or gzip values to differentially unzip a compressed blob in cloud storage.
new_file_extension: str: Provides a file extension if a blob is decompressed and rewritten to cloud storage.
template_table: str: Table name to be used as the load schema. Load operation wil use the same columns and data types as the template table.
max_timeout: int: The maximum number of seconds to wait for a request before the job fails.
**load_kwargs: kwargs: Other arguments to pass to the underlying load_table_from_uri call on the BigQuery client.

copy_s3(table_name, bucket, key, if_exists: str = 'fail', max_errors: int = 0, data_type: str = 'csv', csv_delimiter: str = ',', ignoreheader: int = 1, nullas: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, gcs_client: GoogleCloudStorage | None = None, tmp_gcs_bucket: str | None = None, template_table: str | None = None, job_config: LoadJobConfig | None = None, max_timeout: int = 21600, **load_kwargs)[source]

Copy a file from s3 to BigQuery.

Args:

table_name: str: The table name and schema (tmc.cool_table) to point the file.
bucket: str: The s3 bucket where the file or manifest is located.
key: str: The key of the file or manifest in the s3 bucket.
if_exists: str: If the table already exists, either fail, append, drop or truncate the table.
max_errors: int: The maximum number of rows that can error and be skipped before the job fails.
data_type: str: The data type of the file. Only csv supported currently.
csv_delimiter: str: The delimiter of the csv. Only relevant if data_type is csv.
ignoreheader: int: The number of header rows to skip. Ignored if data_type is json.
nullas: str: Loads fields that match string as NULL
aws_access_key_id:: An AWS access key granted to the bucket where the file is located. Not required if keys are stored as environmental variables.
aws_secret_access_key:: An AWS secret access key granted to the bucket where the file is located. Not required if keys are stored as environmental variables.
gcs_client: object: The GoogleCloudStorage Connector to use for loading data into Google Cloud Storage.
tmp_gcs_bucket: str: The name of the Google Cloud Storage bucket to use to stage the data to load into BigQuery. Required if GCS_TEMP_BUCKET is not specified or set on the class instance.
template_table: str: Table name to be used as the load schema. Load operation wil use the same columns and data types as the template table.
job_config: object: A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided. Note if there are any conflicts between the job_config and other parameters, the job_config values are preferred.
max_timeout: int: The maximum number of seconds to wait for a request before the job fails.

Returns

Parsons Table or None: See Parsons Table for output options.

copy_direct(tbl: Table, table_name: str, if_exists: str = 'fail', max_errors: int = 0, job_config: LoadJobConfig | None = None, template_table: str | None = None, ignoreheader: int = 1, nullas: str | None = None, allow_quoted_newlines: bool = True, allow_jagged_rows: bool = True, quote: str | None = None, schema: List[dict] | None = None, max_timeout: int = 21600, convert_dict_list_columns_to_json: bool = True, **load_kwargs)[source]

Copy a Parsons Table into Google BigQuery directly. This will work well for smaller data. For larger data, use the copy() method which stages the upload through CloudStorage.

Args:

tbl: obj: The Parsons Table to copy into BigQuery.
table_name: str: The table name to load the data into. Will be used to generate load schema if no custom schema or template table are supplied and if_exists is set to “truncate” or “append”.
if_exists: str: If the table already exists, either fail, append, drop or truncate the table.
max_errors: int: The maximum number of rows that can error and be skipped before the job fails.
job_config: object: A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided.
template_table: str: Table name to be used as the load schema. Load operation wil use the same columns and data types as the template table.
max_timeout: int: The maximum number of seconds to wait for a request before the job fails.
convert_dict_list_columns_to_json: bool: If set to True, will convert any dict or list columns (which cannot by default be successfully loaded to BigQuery to JSON strings)
**load_kwargs: kwargs: Arguments to pass to the underlying load_table_from_uri call on the BigQuery client.

copy(tbl: Table, table_name: str, if_exists: str = 'fail', max_errors: int = 0, tmp_gcs_bucket: str | None = None, gcs_client: GoogleCloudStorage | None = None, job_config: LoadJobConfig | None = None, template_table: str | None = None, ignoreheader: int = 1, nullas: str | None = None, allow_quoted_newlines: bool = True, allow_jagged_rows: bool = True, quote: str | None = None, schema: List[dict] | None = None, max_timeout: int = 21600, convert_dict_list_columns_to_json: bool = True, **load_kwargs)[source]

Copy a Parsons Table into Google BigQuery via Google Cloud Storage.

Args:

tbl: obj: The Parsons Table to copy into BigQuery.
table_name: str: The table name to load the data into. Will be used to generate load schema if no custom schema or template table are supplied and if_exists is set to “truncate” or “append”.
if_exists: str: If the table already exists, either fail, append, drop or truncate the table.
max_errors: int: The maximum number of rows that can error and be skipped before the job fails.
tmp_gcs_bucket: str: The name of the Google Cloud Storage bucket to use to stage the data to load into BigQuery. Required if GCS_TEMP_BUCKET is not specified or set on the class instance.
gcs_client: object: The GoogleCloudStorage Connector to use for loading data into Google Cloud Storage.
job_config: object: A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided.
template_table: str: Table name to be used as the load schema. Load operation wil use the same columns and data types as the template table.
max_timeout: int: The maximum number of seconds to wait for a request before the job fails.
convert_dict_list_columns_to_json: bool: If set to True, will convert any dict or list columns (which cannot by default be successfully loaded to BigQuery to JSON strings)
**load_kwargs: kwargs: Arguments to pass to the underlying load_table_from_uri call on the BigQuery client.

duplicate_table(source_table, destination_table, if_exists='fail', drop_source_table=False)[source]

Create a copy of an existing table (or subset of rows) in a new table.

Args:

source_table: str: Name of existing schema and table (e.g. myschema.oldtable)
destination_table: str: Name of destination schema and table (e.g. myschema.newtable)
if_exists: str: If the table already exists, either fail, replace, or ignore the operation.
drop_source_table: boolean: Drop the source table

upsert(table_obj, target_table, primary_key, distinct_check=True, cleanup_temp_table=True, from_s3=False, **copy_args)[source]

Preform an upsert on an existing table. An upsert is a function in which rows in a table are updated and inserted at the same time.

Args:

table_obj: obj: A Parsons table object
target_table: str: The schema and table name to upsert
primary_key: str or list: The primary key column(s) of the target table
distinct_check: boolean: Check if the primary key column is distinct. Raise error if not.
cleanup_temp_table: boolean: A temp table is dropped by default on cleanup. You can set to False for debugging.
from_s3: boolean: Instead of specifying a table_obj (set the first argument to None), set this to True and include copy_s3() arguments to upsert a pre-existing s3 file into the target_table
**copy_args: kwargs: See copy() for options.

delete_table(table_name)[source]

Delete a BigQuery table.

Args:

table_name: str: The name of the table to delete.

table_exists(table_name: str) → bool[source]

Check whether or not the Google BigQuery table exists in the specified dataset.

Args:

table_name: str: The name of the BigQuery table to check for

Returns:

bool: True if the table exists in the specified dataset, false otherwise

get_tables(schema, table_name: str | None = None)[source]

List the tables in a schema including metadata.

Args:

schema: str: Filter by a schema
table_name: str: Filter by a table name

Returns:

Parsons Table: See Parsons Table for output options.

get_views(schema, view: str | None = None)[source]

List views.

Args:

schema: str: Filter by a schema
view: str: Filter by a table name

Returns:

Parsons Table: See Parsons Table for output options.

get_columns(schema: str, table_name: str)[source]

Gets the column names (and other column metadata) for a table. If you need just the column names run get_columns_list(), as it is faster.

Args:

schema: str: The schema name
table_name: str: The table name

Returns:

A dictionary mapping column name to a dictionary with extra info. The keys of the dictionary are ordered just liked the columns in the table. The extra info is a dict with format

get_columns_list(schema: str, table_name: str) → list[source]

Gets the column names for a table.

Args:

schema: str: The schema name
table_name: str: The table name

Returns:

A list of column names

get_row_count(schema: str, table_name: str) → int[source]

Gets the row count for a BigQuery materialization.

Caution: This method uses SELECT COUNT(*) which can be expensive for large tables, especially those with many columns. This is because BigQuery scans all table data to perform the count, even though only the row count is returned.

Args:

schema: str: The schema name
table_name: str: The table name

Returns:

Row count of the target table

extract(dataset: str, table_name: str, gcs_bucket: str, gcs_blob_name: str, project: str | None = None, gzip: bool = False, location: str = 'US', destination_file_format: str = 'CSV', field_delimiter: str = ',', compression: str = None, job_config: ExtractJobConfig = None, wait_for_job_to_complete: bool = True, **export_kwargs) → None[source]

Extracts a BigQuery table to a Google Cloud Storage bucket.

Args:

dataset (str): The BigQuery dataset containing the table. table_name (str): The name of the table to extract. gcs_bucket (str): The GCS bucket where the table will be

exported.

gcs_blob_name (str): The name of the blob in the GCS: bucket.
project (Optional[str]): The Google Cloud project ID. If: not provided, the default project of the client is used.
gzip (bool): If True, the exported file will be compressed: using GZIP. Defaults to False.

copy_between_projects(source_project, source_dataset, source_table, destination_project, destination_dataset, destination_table, if_dataset_not_exists='fail', if_table_exists='fail')[source]

Copy a table from one project to another. Fails if the source or target project

does not exist.

If the target dataset does not exist, fhe flag if_dataset_not_exists controls behavior.

It defaults to ‘fail’; set it to ‘create’ if it’s ok to create it.

If the target table exists, the flag if_table_exists controls behavior.

It defaults to ‘fail’; set it to ‘overwrite’ if it’s ok to overwrite an existing table.

Args:

source_project: str: Name of source project
source_dataset: str: Name of source dataset
source_table: str: Name of source table
destination_project: str: Name of destination project
destination_dataset: str: Name of destination dataset
destination_table: str: Name of destination table
if_dataset_not_exists: str: Action if dataset doesn’t exist {‘fail’,’create’}
if_table_exists: str: Action if table exists {‘fail’, ‘overwrite’}

Returns:

None

Cloud Storage

Overview

Google Cloud Storage is a cloud file storage system. It uses buckets in which to store arbitrary files referred to as blobs. You may use this connector to upload Parsons tables as blobs, download them to files, and list available blobs.

To use the GoogleCloudStorage class, you will need Google service account credentials. If you are the administrator of your Google Cloud account, you can generate them in the Google Cloud Console APIs and Services.

Quickstart

To instantiate the GoogleBigQuery class, you can pass the constructor a string containing either the name of your Google service account credentials file or a JSON string encoding those credentials. Alternatively, you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS to be either of those strings and call the constructor without that argument.

from parsons import GoogleCloudStorage

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google_credentials_file.json'

gcs = GoogleCloudStorage()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

credentials_filename = 'google_credentials_file.json'
project = 'parsons-test'    # Project in which we're working
gcs = GoogleCloudStorage(app_creds=credentials_filename, project=project)

Now we can create buckets, upload blobs to them and and list/retrieve the available blobs.

gcs.create_bucket('parsons_bucket')
gcs.list_buckets()

gcs.upload_table(bucket='parsons_bucket', table=parsons_table, blob_name='parsons_blob')
gcs.get_blob(bucket_name='parsons_bucket', blob_name='parsons_blob')

API

class parsons.google.google_cloud_storage.GoogleCloudStorage(app_creds: str | dict | Credentials | None = None, project=None)[source]

Google Cloud Storage connector utility

This class requires application credentials in the form of a json or google oauth2 Credentials object. It can be passed in the following ways:

Set an environmental variable named GOOGLE_APPLICATION_CREDENTIALS with the local path to the credentials json.

Example: GOOGLE_APPLICATION_CREDENTALS='path/to/creds.json'
Pass in the path to the credentials using the app_creds argument.
Pass in a json string using the app_creds argument.
Generate the google credentials object directly, pass in using the app_creds argument.

For example, to pass in credentials from a parent shell that is authenticated with gcloud auth: ``` from google.auth import default

app_creds, _ = default()

gcs = GoogleCloudStorage(app_creds=app_creds) ```

Args:

app_creds: str, dict, or google.oauth2.credentials.Credentials object: A credentials json string or a path to a json file. Not required if GOOGLE_APPLICATION_CREDENTIALS env variable set. Can also pass a google oauth2 Credentials object directly.
project: str: The project which the client is acting on behalf of. If not passed then will use the default inferred environment.

Returns:

GoogleCloudStorage Class

client: Access all methods of google.cloud package

list_buckets()[source]

Returns a list of buckets

Returns:: List of buckets

bucket_exists(bucket_name)[source]

Verify that a bucket exists

Args:

bucket_name: str: The name of the bucket

Returns:

boolean

get_bucket(bucket_name)[source]

Returns a bucket object

Args:

bucket_name: str: The name of bucket

Returns:

GoogleCloud Storage bucket

create_bucket(bucket_name)[source]

Create a bucket.

Args:

bucket_name: str: A globally unique name for the bucket.

Returns:

None

delete_bucket(bucket_name, delete_blobs=False)[source]

Delete a bucket. Will fail if not empty unless delete_blobs argument is set to True.

Args:

bucket_name: str: The name of the bucket
delete_blobs: boolean: Delete blobs in the bucket, if it is not empty

Returns:

None

list_blobs(bucket_name, max_results=None, prefix=None, match_glob=None, include_file_details=False)[source]

List all of the blobs in a bucket

Args:

bucket_name: str: The name of the bucket
max_results: int: Maximum number of blobs to return
prefix: str: A prefix to filter files
match_glob: str: Filters files based on glob string. NOTE that the match_glob parameter runs on the full blob URI, include a preceding wildcard value to account for nested files (/ for one level, */ for n levels)
include_file_details: bool: If True, returns a list of Blob objects with accessible metadata. For documentation of attributes associated with Blob objects see https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob

Returns:

A list of blob names (or Blob objects if include_file_details is invoked)

blob_exists(bucket_name, blob_name)[source]

Verify that a blob exists in the specified bucket

Args:

bucket_name: str: The bucket name
blob_name: str: The name of the blob

Returns:

boolean

get_blob(bucket_name, blob_name)[source]

Get a blob object

Args:

bucket_name: str: A bucket name
blob_name: str: A blob name

Returns:

A Google Storage blob object

put_blob(bucket_name, blob_name, local_path, **kwargs)[source]

Puts a blob (aka file) in a bucket

Args:

bucket_name:: The name of the bucket to store the blob
blob_name:: The name of blob to be stored in the bucket
local_path: str: The local path of the file to upload

Returns:

None

download_blob(bucket_name, blob_name, local_path=None)[source]

Gets a blob from a bucket

Args:

bucket_name: str: The name of the bucket
blob_name: str: The name of the blob
local_path: str: The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.

Returns:

str: The path of the downloaded file

delete_blob(bucket_name, blob_name)[source]

Delete a blob

Args:

bucket_name: str: The bucket name
blob_name: str: The blob name

Returns:

None

upload_table(table, bucket_name, blob_name, data_type='csv', default_acl=None)[source]

Load the data from a Parsons table into a blob.

Args:

table: obj: A Parsons Table
bucket_name: str: The name of the bucket to upload the data into.
blob_name: str: The name of the blob to upload the data into.
data_type: str: The file format to use when writing the data. One of: csv or json
default_acl:: ACL desired for newly uploaded table

Returns:

String representation of file URI in GCS

get_url(bucket_name, blob_name, expires_in=60)[source]

Generates a presigned url for a blob

Args:

bucket_name: str: The name of the bucket
blob_name: str: The name of the blob
expires_in: int: Minutes until the url expires

Returns:

url:: A link to download the object

copy_bucket_to_gcs(gcs_sink_bucket: str, source: str, source_bucket: str, destination_path: str = '', source_path: str = '', aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None)[source]

Creates a one-time transfer job from Amazon S3 to Google Cloud Storage. Copies all blobs within the bucket unless a key or prefix is passed.

Args:

gcs_sink_bucket (str):: Destination for the data transfer (located in GCS)
source (str):: File storge vendor [gcs or s3]
source_bucket (str):: Source bucket name
source_path (str):: Path in the source system pointing to the relevant keys / files to sync. Must end in a ‘/’
aws_access_key_id (str):: Access key to authenticate storage transfer
aws_secret_access_key (str):: Secret key to authenticate storage transfer

format_uri(bucket: str, name: str)[source]

Represent a GCS URI as a string

Args:

bucket: str: GCS bucket name
name: str: Filename in bucket

Returns:

String represetnation of URI

split_uri(gcs_uri: str)[source]

Split a GCS URI into a bucket and blob name

Args:

gcs_uri: str: GCS URI

Returns:

Tuple of strings with bucket_name and blob_name

unzip_blob(bucket_name: str, blob_name: str, compression_type: str = 'gzip', new_filename: str | None = None, new_file_extension: str | None = None) → str[source]

Downloads and decompresses a blob. The decompressed blob is re-uploaded with the same filename if no new_filename parameter is provided.

Args:

bucket_name: str: GCS bucket name
blob_name: str: Blob name in GCS bucket
compression_type: str: Either zip or gzip
new_filename: str: If provided, replaces the existing blob name when the decompressed file is uploaded
new_file_extension: str: If provided, replaces the file extension when the decompressed file is uploaded

Returns:

String representation of decompressed GCS URI

Civic

Overview

Google Civic is an API which provides helpful information about elections. In order to access Google Civic you must create a Google Developer Key in their API console. In order to use Google Civic, you must enable this specific end point.

The Google Civic API utilizes the Voting Information Project to collect key civic information such as personalized ballots and polling location information.

Quickstart

To instantiate the GoogleCivic class, you can pass the constructor a string containing the Google Civic API key you’ve generated for your project, or set the environment variable GOOGLE_CIVIC_API_KEY to that value.

from parsons import GoogleCivic

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_CIVIC_API_KEY'] = 'AIzaSyAOVZVeL-snv3vNDUdw6QSiCvZRXk1xM'

google_civic = GoogleCivic()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

google_civic = GoogleCivic(api_key='AIzaSyAOVZVeL-snv3vNDUdw6QSiCvZRXk1xM')

Now you can retrieve election information

elections = google_civic.get_elections()

address = '1600 Pennsylvania Avenue, Washington DC'
election_id = '7000'  # General Election
google_civic.get_polling_location(election_id=election_id, address=address)

You can also retrieve represntative information such as offices, officals, etc.

address = '1600 Pennsylvania Avenue, Washington DC'
representatives = google_civic.get_representatives_by_address(address=address)

API

class parsons.google.google_civic.GoogleCivic(api_key=None)[source]

Args:

api_keystr: A valid Google api key. Not required if GOOGLE_CIVIC_API_KEY env variable set.

Returns:

class

get_elections()[source]

Get a collection of information about elections and voter information.

Returns:

Parsons Table: See Parsons Table for output options.

get_polling_location(election_id, address)[source]

Get polling location information for a given address.

Args:

election_id: int: A valid election id. Election ids can be found by running the get_elections() method.
address: str: A valid US address in a single string.

Returns:

Parsons Table: See Parsons Table for output options.

get_polling_locations(election_id, table, address_field='address')[source]

Get polling location information for a table of addresses.

Args:

election_id: int: A valid election id. Election ids can be found by running the get_elections() method.
address: str: A valid US address in a single string.
address_field: str: The name of the column where the address is stored.

Returns:

Parsons Table: See Parsons Table for output options.

get_representative_info_by_address(address: str, include_offices=True, levels=None, roles=None)[source]

Get representative information for a given address. This method returns the raw JSON response from the Google Civic API. It is a complex response that is not easily parsed into a table. Here is the information on how to parse the response: https://developers.google.com/civic-information/docs/v2/representatives/representativeInfoByAddress

Args:

address: str: A valid US address in a single string.
include_offices: bool: Whether to return information about offices and officials. If false, only the top-level district information will be returned. (Default: True)
levels: list of str: A list of office levels to filter by. Only offices that serve at least one of these levels will be returned. Divisions that don’t contain a matching office will not be returned.

Acceptable values are: “administrativeArea1” “administrativeArea2” “country” “international” “locality” “regional” “special” “subLocality1” “subLocality2”
roles: list of str: A list of office roles to filter by. Only offices fulfilling one of these roles will be returned. Divisions that don’t contain a matching office will not be returned.

Acceptable values are: “deputyHeadOfGovernment” “executiveCouncil” “governmentOfficer” “headOfGovernment” “headOfState” “highestCourtJudge” “judge” “legislatorLowerBody” “legislatorUpperBody” “schoolBoard” “specialPurposeOfficer”

Returns:

Parsons Table: See Parsons Table for output options.

Google Sheets

Overview

The GoogleSheets class allows you to interact with Google service account spreadsheets, called “Google Sheets.” You can create, modify, read, format, share and delete sheets with this connector.

In order to instantiate the class, you must pass Google service account credentials as a dictionary, or store the credentials as a JSON file locally and pass the path to the file as a string in the GOOGLE_DRIVE_CREDENTIALS environment variable. You can follow these steps:

Go to the Google Developer Console and make sure the “Google Drive API” and the “Google Sheets API” are both enabled.
Go to the credentials page via the lefthand sidebar. On the credentials page, click “create credentials”.
Choose the “Service Account” option and fill out the form provided. This should generate your credentials.
Select your newly created Service Account on the credentials main page.
select “keys”, then “add key”, then “create new key”. Pick the key type JSON. The credentials should start to automatically download.

You can now copy and paste the data from the key into your script or (recommended) save it locally as a JSON file.

Quickstart

To instantiate the GoogleSheets class, you can either pass the constructor a dict containing your Google service account credentials or define the environment variable GOOGLE_DRIVE_CREDENTIALS to contain a path to the JSON file containing the dict.

from parsons import GoogleSheets

# First approach: Use API credentials via environmental variables
sheets = GoogleSheets()

# Second approach: Pass API credentials as argument
credential_filename = 'google_drive_service_credentials.json'
credentials = json.load(open(credential_filename))
sheets = GoogleSheets(google_keyfile_dict=credentials)

You can then create/modify/retrieve documents using instance methods:

sheet_id = sheets.create_spreadsheet('Voter Cell Phones')
sheets.append_to_sheet(sheet_id, people_with_cell_phones)
parsons_table = sheets.get_worksheet(sheet_id)

You may also want to share the document with your service or user account.

API

class parsons.google.google_sheets.GoogleSheets(google_keyfile_dict=None, subject=None)[source]

A connector for Google Sheets, handling data import and export.

Args:

google_keyfile_dict: dict: A dictionary of Google Drive API credentials, parsed from JSON provided by the Google Developer Console. Required if env variable GOOGLE_DRIVE_CREDENTIALS is not populated.
subject: string: In order to use account impersonation, pass in the email address of the account to be impersonated as a string.

list_worksheets(spreadsheet_id)[source]

Return a list of worksheets in the spreadsheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)

Returns:

list: A List of worksheets order by their index

get_worksheet_index(spreadsheet_id, title)[source]

Get the first sheet in a Google spreadsheet with the given title. The title is case sensitive and the index begins with 0.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
title: str: The sheet title

Returns:

str: The sheet index

get_worksheet(spreadsheet_id, worksheet=0)[source]

Create a parsons table from a sheet in a Google spreadsheet, given the sheet index.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
worksheet: str or int: The index or the title of the worksheet. The index begins with 0.

Returns:

Parsons Table: See Parsons Table for output options.

share_spreadsheet(spreadsheet_id, sharee, share_type='user', role='reader', notify=True, notify_message=None, with_link=False)[source]

Share a spreadsheet with a user, group of users, domain and/or the public.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
sharee: str: User or group e-mail address, domain name to share the spreadsheet with. To share publicly, set sharee value to None.
share_type: str: The sharee type. Allowed values are: user, group, domain, anyone.
role: str: The primary role for this user. Allowed values are: owner, writer, reader.
notify: boolean: Whether to send an email to the target user/domain.
email_message: str: The email to be sent if notify kwarg set to True.
with_link: boolean: Whether a link is required for this permission.

get_spreadsheet_permissions(spreadsheet_id)[source]

List the permissioned users and groups for a spreadsheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)

Returns:

Parsons Table: See Parsons Table for output options.

create_spreadsheet(title, editor_email=None, folder_id=None)[source]

Creates a new Google spreadsheet. Optionally shares the new doc with the given email address. Optionally creates the sheet in a specified folder.

Args:

title: str: The human-readable title of the new spreadsheet
editor_email: str (optional): Email address which should be given permissions on this spreadsheet. Tip: You may want to share this file with the service account.
folder_id: str (optional): ID of the Google folder where the spreadsheet should be created. Tip: Get this from the folder URL. Anyone shared on the folder will have access to the spreadsheet.

Returns:

str: The spreadsheet ID

delete_spreadsheet(spreadsheet_id)[source]

Deletes a Google spreadsheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)

add_sheet(spreadsheet_id, title=None, rows=100, cols=25)[source]

Adds a sheet to a Google spreadsheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
rows: int: Number of rows
cols: Number of cols

Returns:

str: The sheet index

append_to_sheet(spreadsheet_id, table, worksheet=0, user_entered_value=False, **kwargs)[source]

Append data from a Parsons table to a Google sheet. Note that the table’s columns are ignored, as we’ll be keeping whatever header row already exists in the Google sheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
table: obj: Parsons table
worksheet: str or int: The index or the title of the worksheet. The index begins with 0.
user_entered_value: bool (optional): If True, will submit cell values as entered (required for entering formulas). Otherwise, values will be entered as strings or numbers only.

paste_data_in_sheet(spreadsheet_id, table, worksheet=0, header=True, startrow=0, startcol=0)[source]

Pastes data from a Parsons table to a Google sheet. Note that this may overwrite presently existing data. This function is useful for adding data to a subsection if an existing sheet that will have other existing data - contrast to overwrite_sheet (which will fully replace any existing data) and append_to_sheet (which sticks the data only after all other existing data).

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL).
table: obj: Parsons table
worksheet: str or int: The index or the title of the worksheet. The index begins with 0.
header: bool: Whether or not the header row gets pasted with the data.
startrow: int: Starting row position of pasted data. Counts from 0.
startcol: int: Starting column position of pasted data. Counts from 0.

overwrite_sheet(spreadsheet_id, table, worksheet=0, user_entered_value=False, **kwargs)[source]

Replace the data in a Google sheet with a Parsons table, using the table’s columns as the first row.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
table: obj: Parsons table
worksheet: str or int: The index or the title of the worksheet. The index begins with 0.
user_entered_value: bool (optional): If True, will submit cell values as entered (required for entering formulas). Otherwise, values will be entered as strings or numbers only.

format_cells(spreadsheet_id, range, cell_format, worksheet=0)[source]

Format the cells of a worksheet.

Args:

spreadsheet_id: str: The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
range: str: The cell range to format. E.g. "A2" or "A2:B100"
cell_format: dict: The formatting to apply to the range. Full options are specified in the GoogleSheets API documentation.
worksheet: str or int: The index or the title of the worksheet. The index begins with 0.

Examples

# Set 'A4' cell's text format to bold
gs.format_cells(sheet_id, "A4", {"textFormat": {"bold": True}}, worksheet=0)

# Color the background of 'A2:B2' cell range yellow,
# change horizontal alignment, text color and font size
gs.format_cells.format(sheet_id, "A2:B2", {
    "backgroundColor": {
        "red": 0.0,
        "green": 0.0,
        "blue": 0.0
        },
    "horizontalAlignment": "CENTER",
    "textFormat": {
        "foregroundColor": {
            "red": 1.0,
            "green": 1.0,
            "blue": 0.0
            },
            "fontSize": 12,
            "bold": True
            }
        }, worksheet=0)