Redshift¶

Overview¶

The Redshift class allows you to interact with an Amazon Redshift relational database. The Redshift Connector utilizes the psycopg2 python package to connect to the database.

Note

S3 Credentials: Redshift only allows data to be copied to the database via S3. As such, the the copy() and copy_s3() methods require S3 credentials and write access on an S3 Bucket, which will be used for storing data en route to Redshift.
Whitelisting: Remember to ensure that the IP address from which you are connecting has been whitelisted.

Quickstart¶

Query the Database

from parsons import Redshift
rs = Redshift()
table = rs.query('select * from tmc_scratch.test_data')

Copy a Parsons Table to the Database

from parsons import Redshift
rs = Redshift()
table = rs.copy(tbl, 'tmc_scratch.test_table', if_exists='drop')

All of the standard copy options can be passed as kwargs. See the copy() method for all options.

Core API¶

Redshift core methods focus on input, output and querying of the database.

Table and View API¶

Table and view utilities are a series of helper methods, all built off of commonly used SQL queries run against the Redshift database.

class parsons.databases.redshift.redshift.RedshiftTableUtilities[source]¶

table_exists(table_name, view=True)[source]¶

Check if a table or view exists in the database.

Args:

table_name: str: The table name and schema (e.g. myschema.mytable).
view: boolean: Check to see if a view exists by the same name

Returns:

boolean: True if the table exists and False if it does not.

get_row_count(table_name)[source]¶

Return the row count of a table.

SQL Code

SELECT COUNT(*) FROM myschema.mytable

Args:

table_name: str: The schema and name (e.g. myschema.mytable) of the table.

Returns:

int

rename_table(table_name, new_table_name)[source]¶

Rename an existing table.

Note

You cannot move schemas when renaming a table. Instead, utilize the table_duplicate(). method.

Parameters:

table_name – str Name of existing schema and table (e.g. myschema.oldtable)
new_table_name – str New name for table with the schema omitted (e.g. newtable).

move_table(source_table, new_table, drop_source_table=False)[source]¶

Move an existing table in the database.It will inherit encoding, sortkey and distkey. Once run, the source table rows will be empty. This is more efficiant than running "create newtable as select * from oldtable".

For more information see: ALTER TABLE APPEND

Parameters:

source_table – str Name of existing schema and table (e.g. my_schema.old_table)
new_table – str New name of schema and table (e.g. my_schema.newtable)
drop_original – boolean Drop the source table.

Returns:

None

populate_table_from_query(query, destination_table, if_exists='fail', distkey=None, sortkey=None)[source]¶

Populate a Redshift table with the results of a SQL query, creating the table if it doesn’t yet exist.

Args:

query: str: The SQL query
destination_table: str: Name of destination schema and table (e.g. mys_chema.new_table)
if_exists: str: If the table already exists, either fail, append, drop, or truncate the table.
distkey: str: The column to use as the distkey for the table.
sortkey: str: The column to use as the sortkey for the table.

duplicate_table(source_table, destination_table, where_clause='', if_exists='fail', drop_source_table=False)[source]¶

Create a copy of an existing table (or subset of rows) in a new table. It will inherit encoding, sortkey and distkey.

Args:

source_table: str: Name of existing schema and table (e.g. myschema.oldtable)
destination_table: str: Name of destination schema and table (e.g. myschema.newtable)
where_clause: str: An optional where clause (e.g. where org = 1).
if_exists: str: If the table already exists, either fail, append, drop, or truncate the table.
drop_source_table: boolean: Drop the source table

union_tables(new_table_name, tables, union_all=True, view=False)[source]¶

Union a series of table into a new table.

Parameters:

new_table_name – str The new table and schema (e.g. myschema.newtable)
tables – list A list of tables to union
union_all – boolean If False will deduplicate rows. If True will include duplicate rows.
view – boolean Create a view rather than a static table

Returns:

None

get_tables(schema=None, table_name=None)[source]¶

List the tables in a schema including metadata.

Parameters:

schema – str Filter by a schema
table_name – str Filter by a table name

Returns:

Parsons Table: See Parsons Table for output options.

get_table_stats(schema=None, table_name=None)[source]¶

List the tables statistics includes row count and size.

Warning

This method is only accessible by Redshift superusers.

Args:

schema: str: Filter by a schema
table_name: str: Filter by a table name

Returns:

Parsons Table: See Parsons Table for output options.

get_columns(schema, table_name)[source]¶

Gets the column names (and some other column info) for a table.

If you just need the column names, you can treat the return value like a list, eg:

for col in rs.get_columns('some_schema', 'some_table'):
    print(col)

Args:

schema: str: The schema name
table_name: str: The table name

Returns:

A dict mapping column name to a dict with extra info. The keys of the dict are ordered just like the columns in the table. The extra info is a dict with format ``{‘data_type’: str, ‘max_length’: int or None, ‘max_precision’: int or None,

‘max_scale’: int or None, ‘is_nullable’: bool}``

get_views(schema=None, view=None)[source]¶

List views.

Parameters:

schema – str Filter by a schema
view – str Filter by a table name

Returns:

Parsons Table: See Parsons Table for output options.

get_queries()[source]¶

Return the Current queries running and queueing, along with resource consumption.

Warning

Must be a Redshift superuser to run this method.

Returns:

Parsons Table: See Parsons Table for output options.

get_max_value(table_name, value_column)[source]¶

Return the max value from a table.

Args:

table_name: str: Schema and table name
value_column: str: The column containing the values

get_object_type(object_name)[source]¶

Get object type.

One of view, table, index, sequence, or TOAST table.

Args:

object_name: str: The schema.obj for which to get the object type.

Returns:

str of the object type.

is_view(object_name)[source]¶

Return true if the object is a view.

Args:

object_name: str: The schema.obj to test if it’s a view.

Returns:

bool

is_table(object_name)[source]¶

Return true if the object is a table.

Args:

object_name: str: The schema.obj to test if it’s a table.

Returns:

bool

get_table_definition(table)[source]¶

Get the table definition (i.e. the create statement).

Args:

table: str: The schema.table for which to get the table definition.

Returns:

str

get_table_definitions(schema=None, table=None)[source]¶

Get the table definition (i.e. the create statement) for multiple tables.

This works similar to get_table_def except it runs a single query to get the ddl for multiple tables. It supports SQL wildcards for schema and table. Only returns the ddl for _tables_ that match schema and table if they exist.

Args:

schema: str: The schema to filter by.
table: str: The table to filter by.

Returns:

list of dicts with matching tables.

get_view_definition(view)[source]¶

Get the view definition (i.e. the create statement).

Args:

view: str: The schema.view for which to get the view definition.

Returns:

str

get_view_definitions(schema=None, view=None)[source]¶

Get the view definition (i.e. the create statement) for multiple views.

This works similar to get_view_def except it runs a single query to get the ddl for multiple views. It supports SQL wildcards for schema and view. Only returns the ddl for _views_ that match schema and view if they exist.

Args:

schema: str: The schema to filter by.
view: str: The view to filter by.

Returns:

list of dicts with matching views.

static split_full_table_name(full_table_name)[source]¶

Split a full table name into its schema and table. If a schema isn’t present, return public for the schema. Similarly, Redshift defaults to the public schema, when one isn’t provided.

Eg: (schema, table) = Redshift.split_full_table_name("some_schema.some_table")

Args:

full_table_name: str: The table name, as “schema.table”

Returns:

tuple: A tuple containing (schema, table)

static combine_schema_and_table_name(schema, table)[source]¶

Creates a full table name by combining a schema and table.

Args:

schema: str: The schema name
table: str: The table name

Returns:

str: The combined full table name

Schema API¶

Schema utilities are a series of helper methods, all built off of commonly used SQL queries run against the Redshift database.

class parsons.databases.redshift.redshift.RedshiftSchema[source]¶

create_schema_with_permissions(schema, group=None)[source]¶

Creates a Redshift schema (if it doesn’t already exist), and grants usage permissions to a Redshift group (if specified).

Args:

schema: str: The schema name
group: str: The Redshift group name
type: str: The type of permissions to grant. Supports select, all, etc. (For full list, see the Redshift GRANT docs)

grant_schema_permissions(schema, group, permissions_type='select')[source]¶

Grants a Redshift group permissions to all tables within an existing schema.

Args:

schema: str: The schema name
group: str: The Redshift group name
type: str: The type of permissions to grant. Supports select, all, etc. (For full list, see the Redshift GRANT docs)