Redshift¶
Overview¶
The Redshift class allows you to interact with an Amazon Redshift relational database. The Redshift Connector utilizes the psycopg2 python package to connect to the database.
Note
- S3 Credentials
Redshift only allows data to be copied to the database via S3. As such, the the
copy()andcopy_s3()methods require S3 credentials and write access on an S3 Bucket, which will be used for storing data en route to Redshift.- Whitelisting
Remember to ensure that the IP address from which you are connecting has been whitelisted.
Quickstart¶
Query the Database
from parsons import Redshift
rs = Redshift()
table = rs.query('select * from tmc_scratch.test_data')
Copy a Parsons Table to the Database
from parsons import Redshift
rs = Redshift()
table = rs.copy(tbl, 'tmc_scratch.test_table', if_exists='drop')
All of the standard copy options can be passed as kwargs. See the copy() method for all
options.
Core API¶
Redshift core methods focus on input, output and querying of the database.
Table and View API¶
Table and view utilities are a series of helper methods, all built off of commonly used SQL queries run against the Redshift database.
- class parsons.databases.redshift.redshift.RedshiftTableUtilities[source]¶
- table_exists(table_name, view=True)[source]¶
Check if a table or view exists in the database.
- Args:
- table_name: str
The table name and schema (e.g.
myschema.mytable).- view: boolean
Check to see if a view exists by the same name
- Returns:
- boolean
Trueif the table exists andFalseif it does not.
- get_row_count(table_name)[source]¶
Return the row count of a table.
SQL Code
SELECT COUNT(*) FROM myschema.mytable
- Args:
- table_name: str
The schema and name (e.g.
myschema.mytable) of the table.
- Returns:
int
- rename_table(table_name, new_table_name)[source]¶
Rename an existing table.
Note
You cannot move schemas when renaming a table. Instead, utilize the
table_duplicate(). method.- Parameters:
table_name – str Name of existing schema and table (e.g.
myschema.oldtable)new_table_name – str New name for table with the schema omitted (e.g.
newtable).
- move_table(source_table, new_table, drop_source_table=False)[source]¶
Move an existing table in the database.It will inherit encoding, sortkey and distkey. Once run, the source table rows will be empty. This is more efficiant than running
"create newtable as select * from oldtable".For more information see: ALTER TABLE APPEND
- Parameters:
source_table – str Name of existing schema and table (e.g.
my_schema.old_table)new_table – str New name of schema and table (e.g.
my_schema.newtable)drop_original – boolean Drop the source table.
- Returns:
None
- populate_table_from_query(query, destination_table, if_exists='fail', distkey=None, sortkey=None)[source]¶
Populate a Redshift table with the results of a SQL query, creating the table if it doesn’t yet exist.
- Args:
- query: str
The SQL query
- destination_table: str
Name of destination schema and table (e.g.
mys_chema.new_table)- if_exists: str
If the table already exists, either
fail,append,drop, ortruncatethe table.- distkey: str
The column to use as the distkey for the table.
- sortkey: str
The column to use as the sortkey for the table.
- duplicate_table(source_table, destination_table, where_clause='', if_exists='fail', drop_source_table=False)[source]¶
Create a copy of an existing table (or subset of rows) in a new table. It will inherit encoding, sortkey and distkey.
- Args:
- source_table: str
Name of existing schema and table (e.g.
myschema.oldtable)- destination_table: str
Name of destination schema and table (e.g.
myschema.newtable)- where_clause: str
An optional where clause (e.g.
where org = 1).- if_exists: str
If the table already exists, either
fail,append,drop, ortruncatethe table.- drop_source_table: boolean
Drop the source table
- union_tables(new_table_name, tables, union_all=True, view=False)[source]¶
Union a series of table into a new table.
- Parameters:
new_table_name – str The new table and schema (e.g.
myschema.newtable)tables – list A list of tables to union
union_all – boolean If
Falsewill deduplicate rows. IfTruewill include duplicate rows.view – boolean Create a view rather than a static table
- Returns:
None
- get_tables(schema=None, table_name=None)[source]¶
List the tables in a schema including metadata.
- Parameters:
schema – str Filter by a schema
table_name – str Filter by a table name
- Returns:
- Parsons Table
See Parsons Table for output options.
- get_table_stats(schema=None, table_name=None)[source]¶
List the tables statistics includes row count and size.
Warning
This method is only accessible by Redshift superusers.
- Args:
- schema: str
Filter by a schema
- table_name: str
Filter by a table name
- Returns:
- Parsons Table
See Parsons Table for output options.
- get_columns(schema, table_name)[source]¶
Gets the column names (and some other column info) for a table.
If you just need the column names, you can treat the return value like a list, eg:
for col in rs.get_columns('some_schema', 'some_table'): print(col)
- Args:
- schema: str
The schema name
- table_name: str
The table name
- Returns:
A dict mapping column name to a dict with extra info. The keys of the dict are ordered just like the columns in the table. The extra info is a dict with format ``{‘data_type’: str, ‘max_length’: int or None, ‘max_precision’: int or None,
‘max_scale’: int or None, ‘is_nullable’: bool}``
- get_views(schema=None, view=None)[source]¶
List views.
- Parameters:
schema – str Filter by a schema
view – str Filter by a table name
- Returns:
- Parsons Table
See Parsons Table for output options.
- get_queries()[source]¶
Return the Current queries running and queueing, along with resource consumption.
Warning
Must be a Redshift superuser to run this method.
- Returns:
- Parsons Table
See Parsons Table for output options.
- get_max_value(table_name, value_column)[source]¶
Return the max value from a table.
- Args:
- table_name: str
Schema and table name
- value_column: str
The column containing the values
- get_object_type(object_name)[source]¶
Get object type.
One of view, table, index, sequence, or TOAST table.
- Args:
- object_name: str
The schema.obj for which to get the object type.
- Returns:
str of the object type.
- is_view(object_name)[source]¶
Return true if the object is a view.
- Args:
- object_name: str
The schema.obj to test if it’s a view.
- Returns:
bool
- is_table(object_name)[source]¶
Return true if the object is a table.
- Args:
- object_name: str
The schema.obj to test if it’s a table.
- Returns:
bool
- get_table_definition(table)[source]¶
Get the table definition (i.e. the create statement).
- Args:
- table: str
The schema.table for which to get the table definition.
- Returns:
str
- get_table_definitions(schema=None, table=None)[source]¶
Get the table definition (i.e. the create statement) for multiple tables.
This works similar to get_table_def except it runs a single query to get the ddl for multiple tables. It supports SQL wildcards for schema and table. Only returns the ddl for _tables_ that match schema and table if they exist.
- Args:
- schema: str
The schema to filter by.
- table: str
The table to filter by.
- Returns:
list of dicts with matching tables.
- get_view_definition(view)[source]¶
Get the view definition (i.e. the create statement).
- Args:
- view: str
The schema.view for which to get the view definition.
- Returns:
str
- get_view_definitions(schema=None, view=None)[source]¶
Get the view definition (i.e. the create statement) for multiple views.
This works similar to get_view_def except it runs a single query to get the ddl for multiple views. It supports SQL wildcards for schema and view. Only returns the ddl for _views_ that match schema and view if they exist.
- Args:
- schema: str
The schema to filter by.
- view: str
The view to filter by.
- Returns:
list of dicts with matching views.
- static split_full_table_name(full_table_name)[source]¶
Split a full table name into its schema and table. If a schema isn’t present, return public for the schema. Similarly, Redshift defaults to the public schema, when one isn’t provided.
Eg:
(schema, table) = Redshift.split_full_table_name("some_schema.some_table")- Args:
- full_table_name: str
The table name, as “schema.table”
- Returns:
- tuple
A tuple containing (schema, table)
Schema API¶
Schema utilities are a series of helper methods, all built off of commonly used SQL queries run against the Redshift database.
- class parsons.databases.redshift.redshift.RedshiftSchema[source]¶
- create_schema_with_permissions(schema, group=None)[source]¶
Creates a Redshift schema (if it doesn’t already exist), and grants usage permissions to a Redshift group (if specified).
- Args:
- schema: str
The schema name
- group: str
The Redshift group name
- type: str
The type of permissions to grant. Supports select, all, etc. (For full list, see the Redshift GRANT docs)
- grant_schema_permissions(schema, group, permissions_type='select')[source]¶
Grants a Redshift group permissions to all tables within an existing schema.
- Args:
- schema: str
The schema name
- group: str
The Redshift group name
- type: str
The type of permissions to grant. Supports select, all, etc. (For full list, see the Redshift GRANT docs)