Google

Google Cloud services allow you to upload and manipulate Tables as spreadsheets (via GoogleSheets) or query them as SQL database tables (via GoogleBigQuery). You can also upload/store/download them as binary objects (via GoogleCloudStorage). Finally, Google offers an API for civic information using GoogleCivic.

For all of these services you will need to enable the APIs for your Google Cloud account and obtain authentication tokens to access them from your scripts. If you are the administrator of your Google Cloud account, you can do both of these at Google Cloud Console APIs and Services.

BigQuery

Overview

Google BigQuery is a cloud data warehouse solution. Data is stored in tables, and users can query using SQL. BigQuery uses datasets as top level containers for tables, and datasets are themselves contained within Google Cloud projects.

Quickstart

To instantiate the GoogleBigQuery class, you can pass the constructor a string containing either the name of the Google service account credentials file or a JSON string encoding those credentials. Alternatively, you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS to be either of those strings and call the constructor without that argument.

from parsons import GoogleBigQuery

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google_credentials_file.json'

big_query = GoogleBigQuery()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

# Project in which we're working
project = 'parsons-test'
big_query = GoogleBigQuery(app_creds='google_credentials_file.json',
                           project=project)

We can now upload/query data.

dataset = 'parsons_dataset'
table = 'parsons_table'

# Table name should be project.dataset.table, or dataset.table, if
# working with the default project
table_name = project + '.' + dataset + '.' + table

# Must be pre-existing bucket. Create via GoogleCloudStorage() or
# at https://console.cloud.google.com/storage/create-bucket. May be
# omitted if the name of the bucket is specified in environment
# variable GCS_TEMP_BUCKET.
gcs_temp_bucket = 'parsons_bucket'

# Create dataset if it doesn't already exist
big_query.client.create_dataset(dataset=dataset, exists_ok=True)

parsons_table = Table([{'name':'Bob', 'party':'D'},
                       {'name':'Jane', 'party':'D'},
                       {'name':'Sue', 'party':'R'},
                       {'name':'Bill', 'party':'I'}])

# Copy table in to create new BigQuery table
big_query.copy(table_obj=parsons_table,
               table_name=table_name,
               tmp_gcs_bucket=gcs_temp_bucket)

# Select from project.dataset.table
big_query.query(f'select name from {table_name} where party = "D"')

# Delete the table when we're done
big_query.client.delete_table(table=table_name)

API

class parsons.google.google_bigquery.GoogleBigQuery(app_creds=None, project=None, location=None)[source]

Class for querying BigQuery table and returning the data as Parsons tables.

This class requires application credentials in the form of a json. It can be passed in the following ways:

  • Set an environmental variable named GOOGLE_APPLICATION_CREDENTIALS with the local path to the credentials json.

    Example: GOOGLE_APPLICATION_CREDENTALS='path/to/creds.json'

  • Pass in the path to the credentials using the app_creds argument.

  • Pass in a json string using the app_creds argument.

Args:
app_creds: str
A credentials json string or a path to a json file. Not required if GOOGLE_APPLICATION_CREDENTIALS env variable set.
project: str
The project which the client is acting on behalf of. If not passed then will use the default inferred environment.
location: str
Default geographic location for tables
copy(table_obj, table_name, if_exists='fail', tmp_gcs_bucket=None, gcs_client=None, job_config=None, **load_kwargs)[source]

Copy a Parsons Table into Google BigQuery via Google Cloud Storage.

Args:
table_obj: obj
The Parsons Table to copy into BigQuery.
table_name: str
The table name to load the data into.
if_exists: str
If the table already exists, either fail, append, drop or truncate the table.
tmp_gcs_bucket: str
The name of the Google Cloud Storage bucket to use to stage the data to load into BigQuery. Required if GCS_TEMP_BUCKET is not specified.
gcs_client: object
The GoogleCloudStorage Connector to use for loading data into Google Cloud Storage.
job_config: object
A LoadJobConfig object to provide to the underlying call to load_table_from_uri on the BigQuery client. The function will create its own if not provided.
**load_kwargs: kwargs
Arguments to pass to the underlying load_table_from_uri call on the BigQuery client.
delete_table(table_name)[source]

Delete a BigQuery table.

Args:
table_name: str
The name of the table to delete.
query(sql, parameters=None)[source]

Run a BigQuery query and return the results as a Parsons table.

To include python variables in your query, it is recommended to pass them as parameters, following the BigQuery style where parameters are prefixed with @`s. Using the ``parameters` argument ensures that values are escaped properly, and avoids SQL injection attacks.

Parameter Examples

name = "Beatrice O'Brady"
sql = 'SELECT * FROM my_table WHERE name = %s'
rs.query(sql, parameters=[name])
name = "Beatrice O'Brady"
sql = "SELECT * FROM my_table WHERE name = %(name)s"
rs.query(sql, parameters={'name': name})
Args:
sql: str
A valid BigTable statement
parameters: dict
A dictionary of query parameters for BigQuery.
Returns:
Parsons Table
See Parsons Table for output options.
table_exists(table_name)[source]

Check whether or not the Google BigQuery table exists in the specified dataset.

Args:
table_name: str
The name of the BigQuery table to check for
Returns:
bool
True if the table exists in the specified dataset, false otherwise
client

Get the Google BigQuery client to use for making queries.

Returns:
google.cloud.bigquery.client.Client

Cloud Storage

Overview

Google Cloud Storage is a cloud file storage system. It uses buckets in which to store arbitrary files referred to as blobs. You may use this connector to upload Parsons tables as blobs, download them to files, and list available blobs.

To use the GoogleCloudStorage class, you will need Google service account credentials. If you are the administrator of your Google Cloud account, you can generate them in the Google Cloud Console APIs and Services.

Quickstart

To instantiate the GoogleBigQuery class, you can pass the constructor a string containing either the name of your Google service account credentials file or a JSON string encoding those credentials. Alternatively, you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS to be either of those strings and call the constructor without that argument.

from parsons import GoogleCloudStorage

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'google_credentials_file.json'

gcs = GoogleCloudStorage()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

credentials_filename = 'google_credentials_file.json'
project = 'parsons-test'    # Project in which we're working
gcs = GoogleCloudStorage(app_creds=credentials_filename, project=project)

Now we can create buckets, upload blobs to them and and list/retrieve the available blobs.

gcs.create_bucket('parsons_bucket')
gcs.list_buckets()

gcs.upload_table(bucket='parsons_bucket', table=parsons_table, blob_name='parsons_blob')
gcs.get_blob(bucket_name='parsons_bucket', blob_name='parsons_blob')

API

class parsons.google.google_cloud_storage.GoogleCloudStorage(app_creds=None, project=None)[source]

This class requires application credentials in the form of a json. It can be passed in the following ways:

  • Set an environmental variable named GOOGLE_APPLICATION_CREDENTIALS with the local path to the credentials json.

    Example: GOOGLE_APPLICATION_CREDENTALS='path/to/creds.json'

  • Pass in the path to the credentials using the app_creds argument.

  • Pass in a json string using the app_creds argument.

Args:
app_creds: str
A credentials json string or a path to a json file. Not required if GOOGLE_APPLICATION_CREDENTIALS env variable set.
project: str
The project which the client is acting on behalf of. If not passed then will use the default inferred environment.
Returns:
GoogleCloudStorage Class
client = None

Access all methods of google.cloud package

list_buckets()[source]

Returns a list of buckets

Returns:
List of buckets
bucket_exists(bucket_name)[source]

Verify that a bucket exists

Args:
bucket_name: str
The name of the bucket
Returns:
boolean
get_bucket(bucket_name)[source]

Returns a bucket object

Args:
bucket_name: str
The name of bucket
Returns:
GoogleCloud Storage bucket
create_bucket(bucket_name)[source]

Create a bucket.

Args:
bucket_name: str
A globally unique name for the bucket.
Returns:
None
delete_bucket(bucket_name, delete_blobs=False)[source]

Delete a bucket. Will fail if not empty unless delete_blobs argument is set to True.

Args:
bucket_name: str
The name of the bucket
delete_blobs: boolean
Delete blobs in the bucket, if it is not empty
Returns:
None
list_blobs(bucket_name, max_results=None, prefix=None)[source]

List all of the blobs in a bucket

Args:
bucket_name: str
The name of the bucket
max_results: int
TBD
prefix_filter: str
A prefix to filter files
Returns:
A list of blob names
blob_exists(bucket_name, blob_name)[source]

Verify that a blob exists in the specified bucket

Args:
bucket_name: str
The bucket name
blob_name: str
The name of the blob
Returns:
boolean
get_blob(bucket_name, blob_name)[source]

Get a blob object

Args:
bucket_name: str
A bucket name
blob_name: str
A blob name
Returns:
A Google Storage blob object
put_blob(bucket_name, blob_name, local_path)[source]

Puts a blob (aka file) in a bucket

Args:
blob_name:
The name of blob to be stored in the bucket
bucket_name:
The name of the bucket to store the blob
local_path: str
The local path of the file to upload
Returns:
None
download_blob(bucket_name, blob_name, local_path=None)[source]

Gets a blob from a bucket

Args:
bucket_name: str
The name of the bucket
blob_name: str
The name of the blob
local_path: str
The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.
Returns:
str
The path of the downloaded file
delete_blob(bucket_name, blob_name)[source]

Delete a blob

Args:
bucket_name: str
The bucket name
blob_name: str
The blob name
Returns:
None
upload_table(table, bucket_name, blob_name, data_type='csv', default_acl=None)[source]

Load the data from a Parsons table into a blob.

Args:
table: obj
A Parsons Table
bucket_name: str
The name of the bucket to upload the data into.
blob_name: str
The name of the blob to upload the data into.
data_type: str
The file format to use when writing the data. One of: csv or json
get_url(bucket_name, blob_name, expires_in=60)[source]

Generates a presigned url for a blob

Args:
bucket_name: str
The name of the bucket
blob_name: str
The name of the blob
expires_in: int
Minutes until the url expires
Returns:
url:
A link to download the object

Civic

Overview

Google Civic is an API which provides helpful information about elections. In order to access Google Civic you must create a Google Developer Key in their API console. In order to use Google Civic, you must enable this specific end point.

The Google Civic API utilizes the Voting Information Project to collect key civic information such as personalized ballots and polling location information.

Quickstart

To instantiate the GoogleCivic class, you can pass the constructor a string containing the Google Civic API key you’ve generated for your project, or set the environment variable GOOGLE_CIVIC_API_KEY to that value.

from parsons import GoogleCivic

# Set as environment variable so we don't have to pass it in. May either
# be the file name or a JSON encoding of the credentials.
os.environ['GOOGLE_CIVIC_API_KEY'] = 'AIzaSyAOVZVeL-snv3vNDUdw6QSiCvZRXk1xM'

google_civic = GoogleCivic()

Alternatively, you can pass the credentials in as an argument. In the example below, we also specify the project.

google_civic = GoogleCivic(api_key='AIzaSyAOVZVeL-snv3vNDUdw6QSiCvZRXk1xM')

Now you can retrieve election information

elections = google_civic.get_elections()

address = '1600 Pennsylvania Avenue, Washington DC'
election_id = '7000'  # General Election
google_civic.get_polling_location(election_id=election_id, address=address)

API

class parsons.google.google_civic.GoogleCivic(api_key=None)[source]
Args:
api_key : str
A valid Google api key. Not required if GOOGLE_CIVIC_API_KEY env variable set.
Returns:
class
get_elections()[source]

Get a collection of information about elections and voter information.

Returns:
Parsons Table
See Parsons Table for output options.
get_polling_location(election_id, address)[source]

Get polling location information for a given address.

Args:
election_id: int
A valid election id. Election ids can be found by running the get_elections() method.
address: str
A valid US address in a single string.
Returns:
Parsons Table
See Parsons Table for output options.
get_polling_locations(election_id, table, address_field='address')[source]

Get polling location information for a table of addresses.

Args:
election_id: int
A valid election id. Election ids can be found by running the get_elections() method.
address: str
A valid US address in a single string.
address_field: str
The name of the column where the address is stored.
Returns:
Parsons Table
See Parsons Table for output options.

Google Sheets

Overview

The GoogleSheets class allows you to interact with Google service account spreadsheets, called “Google Sheets.” You can create, modify, read, format, share and delete sheets with this connector.

In order to instantiate the class, you must pass Google service account credentials as a dictionary, or store the credentials as a JSON string in the GOOGLE_DRIVE_CREDENTIALS environment variable. Typically you’ll get the credentials from the Google Developer Console (look for the “Google Drive API”).

Quickstart

To instantiate the GoogleSheets class, you can either pass the constructor a dict containing your Google service account credentials or define the environment variable GOOGLE_DRIVE_CREDENTIALS to contain a JSON encoding of the dict.

from parsons import GoogleSheets

# First approach: Use API credentials via environmental variables
sheets = GoogleSheets()

# Second approach: Pass API credentials as argument
credential_filename = 'google_drive_service_credentials.json'
credentials = json.load(open(credential_filename))
sheets = GoogleSheets(google_keyfile_dict=credentials)

You can then create/modify/retrieve documents using instance methods:

sheet_id = sheets.create_spreadsheet('Voter Cell Phones')
sheets.append_to_sheet(sheet_id, people_with_cell_phones)
parsons_table = sheets.get_worksheet(sheet_id)

API

class parsons.google.google_sheets.GoogleSheets(google_keyfile_dict=None)[source]

A connector for Google Sheets, handling data import and export.

Args:
google_keyfile_dict: dict
A dictionary of Google Drive API credentials, parsed from JSON provided by the Google Developer Console. Required if env variable GOOGLE_DRIVE_CREDENTIALS is not populated.
list_worksheets(spreadsheet_id)[source]

Return a list of worksheets in the spreadsheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
Returns:
list
A List of worksheets order by their index
get_worksheet_index(spreadsheet_id, title)[source]

Get the first sheet in a Google spreadsheet with the given title. The title is case sensitive and the index begins with 0.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
title: str
The sheet title
Returns:
str
The sheet index
get_worksheet(spreadsheet_id, worksheet=0)[source]

Create a parsons table from a sheet in a Google spreadsheet, given the sheet index.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
worksheet: str or int
The index or the title of the worksheet. The index begins with 0.
Returns:
Parsons Table
See Parsons Table for output options.
share_spreadsheet(spreadsheet_id, sharee, share_type='user', role='reader', notify=True, notify_message=None, with_link=False)[source]

Share a spreadsheet with a user, group of users, domain and/or the public.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
sharee: str
User or group e-mail address, domain name to share the spreadsheet with. To share publicly, set sharee value to None.
share_type: str
The sharee type. Allowed values are: user, group, domain, anyone.
role: str
The primary role for this user. Allowed values are: owner, writer, reader.
notify: boolean
Whether to send an email to the target user/domain.
email_message: str
The email to be sent if notify kwarg set to True.
with_link: boolean
Whether a link is required for this permission.
get_spreadsheet_permissions(spreadsheet_id)[source]

List the permissioned users and groups for a spreadsheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
Returns:
Parsons Table
See Parsons Table for output options.
create_spreadsheet(title, editor_email=None)[source]

Create a Google spreadsheet from a Parsons table. Optionally shares the new doc with the given email address.

Args:
title: str
The human-readable title of the new spreadsheet
editor_email: str (optional)
Email address which should be given permissions on this spreadsheet
Returns:
str
The spreadsheet ID
delete_spreadsheet(spreadsheet_id)[source]

Deletes a Google spreadsheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
add_sheet(spreadsheet_id, title=None, rows=100, cols=25)[source]

Adds a sheet to a Google spreadsheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
rows: int
Number of rows
cols
Number of cols
Returns:
str
The sheet index
append_to_sheet(spreadsheet_id, table, worksheet=0, user_entered_value=False, **kwargs)[source]

Append data from a Parsons table to a Google sheet. Note that the table’s columns are ignored, as we’ll be keeping whatever header row already exists in the Google sheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
table: obj
Parsons table
worksheet: str or int
The index or the title of the worksheet. The index begins with 0.
user_entered_value: bool (optional)
If True, will submit cell values as entered (required for entering formulas). Otherwise, values will be entered as strings or numbers only.
overwrite_sheet(spreadsheet_id, table, worksheet=0, user_entered_value=False, **kwargs)[source]

Replace the data in a Google sheet with a Parsons table, using the table’s columns as the first row.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
table: obj
Parsons table
worksheet: str or int
The index or the title of the worksheet. The index begins with 0.
user_entered_value: bool (optional)
If True, will submit cell values as entered (required for entering formulas). Otherwise, values will be entered as strings or numbers only.
format_cells(spreadsheet_id, range, cell_format, worksheet=0)[source]

Format the cells of a worksheet.

Args:
spreadsheet_id: str
The ID of the spreadsheet (Tip: Get this from the spreadsheet URL)
range: str
The cell range to format. E.g. "A2" or "A2:B100"
cell_format: dict
The formatting to apply to the range. Full options are specified in the GoogleSheets API documentation.
worksheet: str or int
The index or the title of the worksheet. The index begins with 0.

Examples

# Set 'A4' cell's text format to bold
gs.format_cells(sheet_id, "A4", {"textFormat": {"bold": True}}, worksheet=0)

# Color the background of 'A2:B2' cell range yellow,
# change horizontal alignment, text color and font size
gs.format_cells.format(sheet_id, "A2:B2", {
    "backgroundColor": {
        "red": 0.0,
        "green": 0.0,
        "blue": 0.0
        },
    "horizontalAlignment": "CENTER",
    "textFormat": {
        "foregroundColor": {
            "red": 1.0,
            "green": 1.0,
            "blue": 0.0
            },
            "fontSize": 12,
            "bold": True
            }
        }, worksheet=0)