Databases

Overview

Parsons offers support for a variety of popular SQL database dialects. The functionality is focused on the ability to query and upload data to SQL databases. Each database class also includes the ability to infer datatypes and data schemas from a Parsons table and automatically create new tables.

Similar to other classes in Parsons, the query methods for databases all return Parsons Table, which allow them to be easily converted to other data types.

There is also support for synchronization of tables between databases as part of the Database Sync framework.

Google BigQuery

See Google for documentation.

MySQL

MySQL is the world’s most popular open source database. The Parsons class leverages on the MySQLdb1 python package.

Quick Start

Authentication

from parsons import MySQL

# Instantiate MySQL from environmental variables
mysql = MySQL()

# Instantiate MySQL from passed variables
mysql = MySQL(username='me', password='secret', host='mydb.com', db='dev', port=3306)

Quick Start

# Query database
tbl = mysql.query('select * from my_schema.secret_sauce')

# Copy data to database
tbl = Table.from_csv('my_file.csv') # Load from a CSV or other source.
mysql.copy(tbl, 'my_schema.winning_formula')
class parsons.MySQL(host=None, username=None, password=None, db=None, port=3306)[source]

Connect to a MySQL database.

Args:
username: str
Required if env variable MYSQL_USERNAME not populated
password: str
Required if env variable MYSQL_PASSWORD not populated
host: str
Required if env variable MYSQL_HOST not populated
db: str
Required if env variable MYSQL_DB not populated
port: int
Can be set by env variable MYSQL_PORT or argument.
connection()[source]

Generate a MySQL connection. The connection is set up as a python “context manager”, so it will be closed automatically (and all queries committed) when the connection goes out of scope.

When using the connection, make sure to put it in a with block (necessary for any context manager): with mysql.connection() as conn:

Returns:
MySQL connection object
query(sql, parameters=None)[source]

Execute a query against the database. Will return None if the query returns zero rows.

To include python variables in your query, it is recommended to pass them as parameters, following the mysql style. Using the parameters argument ensures that values are escaped properly, and avoids SQL injection attacks.

Parameter Examples

# Note that the name contains a quote, which could break your query if not escaped
# properly.
name = "Beatrice O'Brady"
sql = "SELECT * FROM my_table WHERE name = %s"
mysql.query(sql, parameters=[name])
names = ["Allen Smith", "Beatrice O'Brady", "Cathy Thompson"]
placeholders = ', '.join('%s' for item in names)
sql = f"SELECT * FROM my_table WHERE name IN ({placeholders})"
mysql.query(sql, parameters=names)
Args:
sql: str
A valid SQL statement
parameters: list
A list of python variables to be converted into SQL values in your query
Returns:
Parsons Table
See Parsons Table for output options.
query_with_connection(sql, connection, parameters=None, commit=True)[source]

Execute a query against the database, with an existing connection. Useful for batching queries together. Will return None if the query returns zero rows.

Args:
sql: str
A valid SQL statement
connection: obj
A connection object obtained from mysql.connection()
parameters: list
A list of python variables to be converted into SQL values in your query
commit: boolean
Whether to commit the transaction immediately. If False the transaction will be committed when the connection goes out of scope and is closed (or you can commit manually with connection.commit()).
Returns:
Parsons Table
See Parsons Table for output options.
copy(tbl, table_name, if_exists='fail', chunk_size=1000)[source]

Copy a Parsons Table to the database.

Note

This method utilizes extended inserts rather LOAD DATA INFILE since many MySQL Database configurations do not allow data files to be loaded. It results in a minor performance hit compared to LOAD DATA.

tbl: parsons.Table
A Parsons table object
table_name: str
The destination schema and table (e.g. my_schema.my_table)
if_exists: str
If the table already exists, either fail, append, drop or truncate the table.
chunk_size: int
The number of rows to insert per query.
table_exists(table_name)[source]

Check if a table or view exists in the database.

Args:
table_name: str
The table name
view: boolean
Check to see if a view exists by the same name
Returns:
boolean
True if the table exists and False if it does not.

Postgres

Postgres is popular open source SQL database dialect. The Parsons class leverages the mysql python package.

Quick Start

Authentication

from parsons import Postgres

# Instantiate Postgres from environmental variables
mysql = Postgres()

# Instantiate Postgres from passed variables
Postgres = Postgres(username='me', password='secret', host='mydb.com', db='dev', port=3306)

# Instantiate Postgres from a ~/.pgpass file
Postgres = Postgres()

Quick Start

# Query database
tbl = postgres.query('select * from my_schema.secret_sauce')

# Copy data to database
tbl = Table.from_csv('my_file.csv') # Load from a CSV or other source.
postgres.copy(tbl, 'my_schema.winning_formula')
class parsons.Postgres(username=None, password=None, host=None, db=None, port=5432, timeout=10)[source]

A Postgres class to connect to database. Credentials can be passed from a .pgpass file stored in your home directory or with environmental variables.

Args:
username: str
Required if env variable PGUSER not populated
password: str
Required if env variable PGPASSWORD not populated
host: str
Required if env variable PGHOST not populated
db: str
Required if env variable PGDATABASE not populated
port: int
Required if env variable PGPORT not populated.
timeout: int
Seconds to timeout if connection not established.
copy(tbl, table_name, if_exists='fail')[source]

Copy a Parsons Table to Postgres.

tbl: parsons.Table
A Parsons table object
table_name: str
The destination schema and table (e.g. my_schema.my_table)
if_exists: str
If the table already exists, either fail, append, drop or truncate the table.
connection()

Generate a Postgres connection. The connection is set up as a python “context manager”, so it will be closed automatically (and all queries committed) when the connection goes out of scope.

When using the connection, make sure to put it in a with block (necessary for any context manager): with pg.connection() as conn:

Returns:
Psycopg2 connection object
query(sql, parameters=None)

Execute a query against the database. Will return None if the query returns zero rows.

To include python variables in your query, it is recommended to pass them as parameters, following the psycopg style. Using the parameters argument ensures that values are escaped properly, and avoids SQL injection attacks.

Parameter Examples

# Note that the name contains a quote, which could break your query if not escaped
# properly.
name = "Beatrice O'Brady"
sql = "SELECT * FROM my_table WHERE name = %s"
rs.query(sql, parameters=[name])
names = ["Allen Smith", "Beatrice O'Brady", "Cathy Thompson"]
placeholders = ', '.join('%s' for item in names)
sql = f"SELECT * FROM my_table WHERE name IN ({placeholders})"
rs.query(sql, parameters=names)
Args:
sql: str
A valid SQL statement
parameters: list
A list of python variables to be converted into SQL values in your query
Returns:
Parsons Table
See Parsons Table for output options.
query_with_connection(sql, connection, parameters=None, commit=True)

Execute a query against the database, with an existing connection. Useful for batching queries together. Will return None if the query returns zero rows.

Args:
sql: str
A valid SQL statement
connection: obj
A connection object obtained from redshift.connection()
parameters: list
A list of python variables to be converted into SQL values in your query
commit: boolean
Whether to commit the transaction immediately. If False the transaction will be committed when the connection goes out of scope and is closed (or you can commit manually with connection.commit()).
Returns:
Parsons Table
See Parsons Table for output options.
table_exists(table_name, view=True)

Check if a table or view exists in the database.

Args:
table_name: str
The table name and schema (e.g. myschema.mytable).
view: boolean
Check to see if a view exists by the same name
Returns:
boolean
True if the table exists and False if it does not.

Redshift

See Redshift for documentation.