Databases
Overview
Parsons offers support for a variety of popular SQL database dialects. The functionality is focused on the ability to query and upload data to SQL databases. Each database class also includes the ability to infer datatypes and data schemas from a Parsons table and automatically create new tables.
Similar to other classes in Parsons, the query methods for databases all return Parsons Table, which allow them to be easily converted to other data types.
There is also support for synchronization of tables between databases as part of the Database Sync framework.
Google BigQuery
See Google for documentation.
MySQL
MySQL is the world’s most popular open source database. The Parsons class leverages on the MySQLdb1 python package.
Quick Start
Authentication
from parsons import MySQL
# Instantiate MySQL from environmental variables
mysql = MySQL()
# Instantiate MySQL from passed variables
mysql = MySQL(username='me', password='secret', host='mydb.com', db='dev', port=3306)
Quick Start
# Query database
tbl = mysql.query('select * from my_schema.secret_sauce')
# Copy data to database
tbl = Table.from_csv('my_file.csv') # Load from a CSV or other source.
mysql.copy(tbl, 'my_schema.winning_formula')
- class parsons.MySQL(host=None, username=None, password=None, db=None, port=3306)[source]
Connect to a MySQL database.
- Args:
- username: str
Required if env variable
MYSQL_USERNAME
not populated- password: str
Required if env variable
MYSQL_PASSWORD
not populated- host: str
Required if env variable
MYSQL_HOST
not populated- db: str
Required if env variable
MYSQL_DB
not populated- port: int
Can be set by env variable
MYSQL_PORT
or argument.
- connection()[source]
Generate a MySQL connection. The connection is set up as a python “context manager”, so it will be closed automatically (and all queries committed) when the connection goes out of scope.
When using the connection, make sure to put it in a
with
block (necessary for any context manager):with mysql.connection() as conn:
- Returns:
MySQL connection object
- query(sql, parameters=None)[source]
Execute a query against the database. Will return
None
if the query returns zero rows.To include python variables in your query, it is recommended to pass them as parameters, following the mysql style. Using the
parameters
argument ensures that values are escaped properly, and avoids SQL injection attacks.Parameter Examples
# Note that the name contains a quote, which could break your query if not escaped # properly. name = "Beatrice O'Brady" sql = "SELECT * FROM my_table WHERE name = %s" mysql.query(sql, parameters=[name])
names = ["Allen Smith", "Beatrice O'Brady", "Cathy Thompson"] placeholders = ', '.join('%s' for item in names) sql = f"SELECT * FROM my_table WHERE name IN ({placeholders})" mysql.query(sql, parameters=names)
- Args:
- sql: str
A valid SQL statement
- parameters: list
A list of python variables to be converted into SQL values in your query
- Returns:
- Parsons Table
See Parsons Table for output options.
- query_with_connection(sql, connection, parameters=None, commit=True)[source]
Execute a query against the database, with an existing connection. Useful for batching queries together. Will return
None
if the query returns zero rows.- Args:
- sql: str
A valid SQL statement
- connection: obj
A connection object obtained from
mysql.connection()
- parameters: list
A list of python variables to be converted into SQL values in your query
- commit: boolean
Whether to commit the transaction immediately. If
False
the transaction will be committed when the connection goes out of scope and is closed (or you can commit manually withconnection.commit()
).
- Returns:
- Parsons Table
See Parsons Table for output options.
- copy(tbl: Table, table_name: str, if_exists: str = 'fail', chunk_size: int = 1000, strict_length: bool = True)[source]
Copy a Parsons Table to the database.
Note
This method utilizes extended inserts rather LOAD DATA INFILE since many MySQL Database configurations do not allow data files to be loaded. It results in a minor performance hit compared to LOAD DATA.
- Args:
- tbl: parsons.Table
A Parsons table object
- table_name: str
The destination schema and table (e.g.
my_schema.my_table
)- if_exists: str
If the table already exists, either
fail
,append
,drop
ortruncate
the table.- chunk_size: int
The number of rows to insert per query.
- strict_length: bool
If the database table needs to be created, strict_length determines whether the created table’s column sizes will be sized to exactly fit the current data, or if their size will be rounded up to account for future values being larger then the current dataset. defaults to
True
- table_exists(table_name: str) bool [source]
Check if a table or view exists in the database.
- Args:
- table_name: str
The table name
- Returns:
- boolean
True
if the table exists andFalse
if it does not.
- create_table(table_object, table_name)
Create a table based on table object data.
- detect_data_type(value, cmp_type=None)
Detect the higher of value’s type cmp_type.
check if it’s a string
check if it’s a number
check if it’s a float
check if it’s an int
- Args:
- value: str
The value to inspect.
- cmp_type: str
The string representation of a type to compare with
value
’s type.
- Returns:
- str
The string representation of the higher of the two types.
- format_column(col, index='', replace_chars=None, col_prefix='_')
Format the column to meet database contraints.
- Formats the columns as follows:
Coverts to lowercase (if case insensitive)
Strips leading and trailing whitespace
Replaces invalid characters
Renames if in reserved words
- Args:
- col: str
The column to format.
- index: int
(Optional) The index of the column. Used if the column is empty.
- replace_chars: dict
A dictionary of invalid characters and their replacements. If
None
uses {” “: “_”}- col_prefix: str
The prefix to use when the column is empty or starts with an invalid character.
- Returns:
- str
The formatted column.
- format_columns(cols, **kwargs)
Format the columns to meet database contraints.
This method relies on
format_column
to handle most changes. It only handles duplicated columns. Options toformat_column
can be passed through kwargs.- Args:
- cols: list
The columns to format.
- kwargs: dicts
Keyword arguments to pass to
format_column
.
- Returns:
- list
The formatted columns.
- generate_alchemy_url()
Generate a SQL Alchemy engine https://docs.sqlalchemy.org/en/14/core/engines.html#
- generate_engine()
Generate a SQL Alchemy engine.
- get_bigger_int(int1, int2)
Return the bigger of the two ints.
- Args:
- int1: str
The string representation if an int type.
- int2: str
The string representation if an int type.
- Returns:
- str
A string representation of the higher of the two int types.
- get_table_object(table_name)
Get a SQL Alchemy table object.
- is_valid_sql_num(val)
Check whether val is a valid sql number.
- Args:
- val: any
The values to check.
- Returns:
- bool
Whether or not the value is a valid sql number.
- static split_table_name(full_table_name)
Utility method to parse the schema and table name.
Postgres
Postgres is popular open source SQL database dialect. The Parsons class leverages the mysql python package.
Quick Start
Authentication
from parsons import Postgres
# Instantiate Postgres from environmental variables
pg = Postgres()
# Instantiate Postgres from passed variables
pg = Postgres(username='me', password='secret', host='mydb.com', db='dev', port=3306)
# Instantiate Postgres from a ~/.pgpass file
pg = Postgres()
Quick Start
# Query database
tbl = pg.query('select * from my_schema.secret_sauce')
# Copy data to database
tbl = Table.from_csv('my_file.csv') # Load from a CSV or other source.
pg.copy(tbl, 'my_schema.winning_formula')
- class parsons.Postgres(username=None, password=None, host=None, db=None, port=5432, timeout=10)[source]
A Postgres class to connect to database. Credentials can be passed from a
.pgpass
file stored in your home directory or with environmental variables.- Args:
- username: str
Required if env variable
PGUSER
not populated- password: str
Required if env variable
PGPASSWORD
not populated- host: str
Required if env variable
PGHOST
not populated- db: str
Required if env variable
PGDATABASE
not populated- port: int
Required if env variable
PGPORT
not populated.- timeout: int
Seconds to timeout if connection not established.
- copy(tbl: Table, table_name: str, if_exists: str = 'fail', strict_length: bool = False)[source]
Copy a Parsons Table to Postgres.
- Args:
- tbl: parsons.Table
A Parsons table object
- table_name: str
The destination schema and table (e.g.
my_schema.my_table
)- if_exists: str
If the table already exists, either
fail
,append
,drop
ortruncate
the table.- strict_length: bool
If the database table needs to be created, strict_length determines whether the created table’s column sizes will be sized to exactly fit the current data, or if their size will be rounded up to account for future values being larger then the current dataset. Defaults to
False
.
- connection()
Generate a Postgres connection. The connection is set up as a python “context manager”, so it will be closed automatically (and all queries committed) when the connection goes out of scope.
When using the connection, make sure to put it in a
with
block (necessary for any context manager):with pg.connection() as conn:
- Returns:
Psycopg2 connection object
- create_table(table_object, table_name)
Create a table based on table object data.
- detect_data_type(value, cmp_type=None)
Detect the higher of value’s type cmp_type.
check if it’s a string
check if it’s a number
check if it’s a float
check if it’s an int
- Args:
- value: str
The value to inspect.
- cmp_type: str
The string representation of a type to compare with
value
’s type.
- Returns:
- str
The string representation of the higher of the two types.
- format_column(col, index='', replace_chars=None, col_prefix='_')
Format the column to meet database contraints.
- Formats the columns as follows:
Coverts to lowercase (if case insensitive)
Strips leading and trailing whitespace
Replaces invalid characters
Renames if in reserved words
- Args:
- col: str
The column to format.
- index: int
(Optional) The index of the column. Used if the column is empty.
- replace_chars: dict
A dictionary of invalid characters and their replacements. If
None
uses {” “: “_”}- col_prefix: str
The prefix to use when the column is empty or starts with an invalid character.
- Returns:
- str
The formatted column.
- format_columns(cols, **kwargs)
Format the columns to meet database contraints.
This method relies on
format_column
to handle most changes. It only handles duplicated columns. Options toformat_column
can be passed through kwargs.- Args:
- cols: list
The columns to format.
- kwargs: dicts
Keyword arguments to pass to
format_column
.
- Returns:
- list
The formatted columns.
- generate_alchemy_url()
Generate a SQL Alchemy engine https://docs.sqlalchemy.org/en/14/core/engines.html#
- generate_engine()
Generate a SQL Alchemy engine.
- get_bigger_int(int1, int2)
Return the bigger of the two ints.
- Args:
- int1: str
The string representation if an int type.
- int2: str
The string representation if an int type.
- Returns:
- str
A string representation of the higher of the two int types.
- get_table_object(table_name)
Get a SQL Alchemy table object.
- is_valid_sql_num(val)
Check whether val is a valid sql number.
- Args:
- val: any
The values to check.
- Returns:
- bool
Whether or not the value is a valid sql number.
- query(sql: str, parameters: Optional[list] = None) Optional[Table]
Execute a query against the database. Will return
None
if the query returns zero rows.To include python variables in your query, it is recommended to pass them as parameters, following the psycopg style. Using the
parameters
argument ensures that values are escaped properly, and avoids SQL injection attacks.Parameter Examples
# Note that the name contains a quote, which could break your query if not escaped # properly. name = "Beatrice O'Brady" sql = "SELECT * FROM my_table WHERE name = %s" rs.query(sql, parameters=[name])
names = ["Allen Smith", "Beatrice O'Brady", "Cathy Thompson"] placeholders = ', '.join('%s' for item in names) sql = f"SELECT * FROM my_table WHERE name IN ({placeholders})" rs.query(sql, parameters=names)
- Args:
- sql: str
A valid SQL statement
- parameters: list
A list of python variables to be converted into SQL values in your query
- Returns:
- Parsons Table
See Parsons Table for output options.
- query_with_connection(sql, connection, parameters=None, commit=True)
Execute a query against the database, with an existing connection. Useful for batching queries together. Will return
None
if the query returns zero rows.- Args:
- sql: str
A valid SQL statement
- connection: obj
A connection object obtained from
redshift.connection()
- parameters: list
A list of python variables to be converted into SQL values in your query
- commit: boolean
Whether to commit the transaction immediately. If
False
the transaction will be committed when the connection goes out of scope and is closed (or you can commit manually withconnection.commit()
).
- Returns:
- Parsons Table
See Parsons Table for output options.
- static split_table_name(full_table_name)
Utility method to parse the schema and table name.
- table_exists(table_name: str, view: bool = True) bool
Check if a table or view exists in the database.
- Args:
- table_name: str
The table name and schema (e.g.
myschema.mytable
).- view: boolean
Check to see if a view exists by the same name. Defaults to
True
.
- Returns:
- boolean
True
if the table exists andFalse
if it does not.
Redshift
See Redshift for documentation.