Parsons Table

Overview

Most methods and functions in Parsons return a Table, which is a 2D list-like object similar to a Pandas Dataframe. You can call the following methods on the Table object to output it into a variety of formats or storage types. A full list of Table methods can be found in the API section.

From Parsons Table

Method

Destination Type

Description

to_csv()

CSV File

Write a table to a local csv file

to_s3_csv()

AWS s3 Bucket

Write a table to a csv stored in S3

to_gcs_csv()

Google Cloud Storage Bucket

Write a table to a csv stored in Google Cloud Storage

to_sftp_csv()

SFTP Server

Write a table to a csv stored on an SFTP server

to_redshift()

A Redshift Database

Write a table to a Redshift database

to_postgres()

A Postgres Database

Write a table to a Postgres database

to_civis()

Civis Redshift Database

Write a table to Civis platform database

to_petl()

Petl table object

Convert a table a Petl table object

to_json()

JSON file

Write a table to a local JSON file

to_html()

HTML formatted table

Write a table to a local html file

to_dataframe()

Pandas Dataframe [1]

Return a Pandas dataframe

append_csv()

CSV file

Appends table to an existing CSV

to_zip_csv()

ZIP file

Writes a table to a CSV in a zip archive

to_dicts()

Dicts

Write a table as a list of dicts

To Parsons Table

Create Parsons Table object using the following methods.

Method

Source Type

Description

from_csv()

File like object, local path, url, ftp.

Loads a csv object into a Table

from_json()

File like object, local path, url, ftp.

Loads a json object into a Table

from_columns()

List object

Loads lists organized as columns in Table

from_redshift()

Redshift table

Loads a Redshift query into a Table

from_postgres()

Postgres table

Loads a Postgres query into a Table

from_dataframe()

Pandas Dataframe [2]

Load a Parsons table from a Pandas Dataframe

from_s3_csv()

S3 CSV

Load a Parsons table from a csv file on S3

from_csv_string()

File like object, local path, url, ftp.

Load a CSV string into a Table

You can also use the Table constructor to create a Table from a python list or petl table:

# From a list of dicts
tbl = Table([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])

# From a list of lists, the first list holding the field names
tbl = Table([['a', 'b'], [1, 2], [3, 4]])

# From a petl table
tbl = Table(petl_tbl)

Parsons Table Attributes

Tables have a number of convenience attributes.

Attribute

Description

.num_rows

The number of rows in the table

.columns

A list of column names in the table

.data

The actual data (rows) of the table, as a list of tuples (without field names)

.first

The first value in the table. Use for database queries where a single value is returned.

Parsons Table Transformations

Parsons tables have many methods that allow you to easily transform tables. Below is a selection of commonly used methods. The full list can be found in the API section.

Column Transformations

Method

Description

head()

Get the first n rows of a table

tail()

Get the last n rows of a table

add_column()

Add a column

remove_column()

Remove a column

rename_column()

Rename a column

rename_columns()

Rename multiple columns

move_column()

Move a column within a table

cut()

Return a table with a subset of columns

fill_column()

Provide a fixed value to fill a column

fillna_column()

Provide a fixed value to fill all null values in a column

get_column_types()

Get the python type of values for a given column

convert_column()

Transform the values of a column via arbitrary functions

coalesce_columns()

Coalesce values from one or more source columns

map_columns()

Standardizes column names based on multiple possible values

Row Transformations

Method

Description

select_rows()

Return a table of a subset of rows based on filters

stack()

Stack a number of tables on top of one another

chunk()

Divide tables into smaller tables based on row count

remove_null_rows()

Removes rows with null values in specified columns

deduplicate()

Removes duplicate rows based on optional key(s), and optionally sorts

Extraction and Reshaping

Method

Description

unpack_dict()

Unpack dictionary values from one column to top level columns

unpack_list()

Unpack list values from one column and add to top level columns

long_table()

Take a column with nested data and create a new long table

unpack_nested_columns_as_rows()

Unpack list or dict values from one column into separate rows

Parsons Table Indexing

To access rows and columns of data within a Parsons table, you can index on them. To access a column pass in the column name as a string (e.g. tbl['a']) and to access a row, pass in the row index as an integer (e.g. tbl[1]).

tbl = Table([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])

# Return a column as a list
tbl['a']
>> [1, 3]

# Return a row as a dict
tbl[1]
>> {'a': 3, 'b': 4}

A note on indexing and iterating over a table’s data: If you need to iterate over the data, make sure to use the python iterator syntax, so any data transformations can be applied efficiently. An example:

# Some data transformations
table.add_column('newcol', 'some value')

# Efficient way to grab all the data (applying the data transformations only once)
rows_list = [row for row in table]

Warning

If you must index directly into a table’s data, you can do so, but note that data transformations will be applied each time you do so. So this code will be very inefficient on a large table…

# Inefficient way to grab all the data
rows_list = []
for i in range(0, table.num_rows):
  # Data transformations will be applied each time through this loop!
  rows_list.append(table[i])

PETL

The Parsons Table relies heavily on the petl Python package. You can always access the underlying petl table, parsons.Table.table, which will allow you to perform any petl-supported ETL operations. Additionally, you can use the helper method, use_petl(), to conveniently perform the same operations on a parsons Table. For example:

import petl
...

tbl = Table()
tbl.table = petl.skipcomments(tbl.table, '#')

or

tbl = Table()
tbl.use_petl('skipcomments', '#', update_table=True)

Lazy Loading

The Parsons Table makes use of “lazy” loading and “lazy” transformations. What this means is that it tries not to load and process your data until absolutely necessary.

An example:

# Specify where to load the data
tbl = Table.from_csv('name_data.csv')

# Specify data transformations
tbl.add_column('full_name', lambda row: row['first_name'] + ' ' + row['last_name'])
tbl.remove_column(['first_name', 'last_name'])

# Save the table elsewhere
# IMPORTANT - The CSV won't actually be loaded and transformed until this step,
# since this is the first time it's actually needed.
tbl.to_redshift('main.name_table')

This “lazy” loading can be very convenient and performant. However, it can make issues hard to debug. Eg. if your data transformations are time-consuming, you won’t actually notice that performance hit until you try to use the data, potentially much later in your code. There may also be cases where it’s possible to get faster execution by caching a table, especially in situations where a single table will be used as the base for several subsequent calculations.

For these cases Parsons provides two utility functions to materialize a Table and all of its transformations.

Method

Description

materialize()

Load all data from the Table into memory and apply any transformations

materialize_to_file()

Load all data from the Table and apply any transformations, then save to a local temp file.

Examples

Basic Pipelines

# S3 to Civis
s3 = S3()
csv = s3.get_file('tmc-bucket', 'my_ids.csv')
Table.from_csv(csv).to_civis('TMC','ids.my_ids')

#VAN Activist Codes to a Dataframe
van = VAN(db='MyVoters')
van.activist_codes().to_dataframe()

#VAN Events to an s3 bucket
van = VAN(db='MyVoters')
van.events().to_s3_csv('my-van-bucket','myevents.csv')

To & From API

Transformation API

The following methods allow you to manipulate the Parsons table data.

Materialize API