.. _parsons-table: Parsons Table ============= ******** Overview ******** Most methods and functions in Parsons return a ``Table``, which is a 2D list-like object similar to a Pandas Dataframe. You can call the following methods on the Table object to output it into a variety of formats or storage types. A full list of ``Table`` methods can be found in the API section. =================== From Parsons Table =================== .. list-table:: :widths: 25 25 50 :header-rows: 1 * - Method - Destination Type - Description * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_csv` - CSV File - Write a table to a local csv file * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_s3_csv` - AWS s3 Bucket - Write a table to a csv stored in S3 * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_gcs_csv` - Google Cloud Storage Bucket - Write a table to a csv stored in Google Cloud Storage * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_sftp_csv` - SFTP Server - Write a table to a csv stored on an SFTP server * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_csv` - A Redshift Database - Write a table to a Redshift database * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_postgres` - A Postgres Database - Write a table to a Postgres database * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_civis` - Civis Redshift Database - Write a table to Civis platform database * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_petl` - Petl table object - Convert a table a Petl table object * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_json` - JSON file - Write a table to a local JSON file * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_html` - HTML formatted table - Write a table to a local html file * - :py:meth:`~parsons.etl.tofrom.ToFrom.to_dataframe` - Pandas Dataframe [1]_ - Return a Pandas dataframe .. [1] Requires optional installation of Pandas package by running ``pip install pandas``. ================ To Parsons Table ================ Create Parsons Table object using the following methods. .. list-table:: :widths: 25 25 50 :header-rows: 1 * - Method - Source Type - Description * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_csv` - File like object, local path, url, ftp. - Loads a csv object into a Table * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_json` - File like object, local path, url, ftp. - Loads a json object into a Table * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_columns` - List object - Loads lists organized as columns in Table * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_redshift` - Redshift table - Loads a Redshift query into a Table * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_postgres` - Postgres table - Loads a Postgres query into a Table * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_dataframe` - Pandas Dataframe [2]_ - Load a Parsons table from a Pandas Dataframe * - :py:meth:`~parsons.etl.tofrom.ToFrom.from_s3_csv` - S3 CSV - Load a Parsons table from a csv file on S3 .. [2] Requires optional installation of Pandas package by running ``pip install pandas``. You can also use the Table constructor to create a Table from a python list or petl table: .. code-block:: python # From a list of dicts tbl = Table([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]) # From a list of lists, the first list holding the field names tbl = Table([['a', 'b'], [1, 2], [3, 4]]) # From a petl table tbl = Table(petl_tbl) ======================== Parsons Table Attributes ======================== Tables have a number of convenience attributes. .. list-table:: :widths: 25 50 :header-rows: 1 * - Attribute - Description * - ``.num_rows`` - The number of rows in the table * - ``.columns`` - A list of column names in the table * - ``.data`` - The actual data (rows) of the table, as a list of tuples (without field names) * - ``.first`` - The first value in the table. Use for database queries where a single value is returned. ============================= Parsons Table Transformations ============================= Parsons tables have many methods that allow you to easily transform tables. Below is a selection of commonly used methods. The full list can be found in the API section. **Column Transformations** .. list-table:: :widths: 40 200 :header-rows: 1 * - Method - Description * - :py:meth:`~parsons.etl.etl.ETL.add_column` - Add a column * - :py:meth:`~parsons.etl.etl.ETL.remove_column` - Remove a column * - :py:meth:`~parsons.etl.etl.ETL.rename_column` - Rename a column * - :py:meth:`~parsons.etl.etl.ETL.move_column` - Move a column within a table * - :py:meth:`~parsons.etl.etl.ETL.cut` - Return a table with a subset of columns * - :py:meth:`~parsons.etl.etl.ETL.fill_column` - Provide a fixed value to fill a column * - :py:meth:`~parsons.etl.etl.ETL.fillna_column` - Provide a fixed value to fill all null values in a column * - :py:meth:`~parsons.etl.etl.ETL.get_column_types` - Get the python type of values for a given column * - :py:meth:`~parsons.etl.etl.ETL.convert_column` - Transform the values of a column via arbitrary functions * - :py:meth:`~parsons.etl.etl.ETL.coalesce_columns` - Coalesce values from one or more source columns * - :py:meth:`~parsons.etl.etl.ETL.map_columns` - Standardizes column names based on multiple possible values **Row Transformations** .. list-table:: :widths: 25 50 :header-rows: 1 * - Method - Description * - :py:meth:`~parsons.etl.etl.ETL.select_rows` - Return a table of a subset of rows based on filters * - :py:meth:`~parsons.etl.etl.ETL.stack` - Stack a number of tables on top of one another * - :py:meth:`~parsons.etl.etl.ETL.chunk` - Divide tables into smaller tables based on row count * - :py:meth:`~parsons.etl.etl.ETL.remove_null_rows` - Removes rows with null values in specified columns **Extraction and Reshaping** .. list-table:: :widths: 25 50 :header-rows: 1 * - Method - Description * - :py:meth:`~parsons.etl.etl.ETL.unpack_dict` - Unpack dictionary values from one column to top level columns * - :py:meth:`~parsons.etl.etl.ETL.unpack_list` - Unpack list values from one column and add to top level columns * - :py:meth:`~parsons.etl.etl.ETL.long_table` - Take a column with nested data and create a new long table * - :py:meth:`~parsons.etl.etl.ETL.unpack_nested_columns_as_rows` - Unpack list or dict values from one column into separate rows ====================== Parsons Table Indexing ====================== To access rows and columns of data within a Parsons table, you can index on them. To access a column pass in the column name as a string (e.g. ``tbl['a']``) and to access a row, pass in the row index as an integer (e.g. ``tbl[1]``). .. code-block:: python tbl = Table([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]) # Return a column as a list tbl['a'] >> [1, 3] # Return a row as a dict tbl[1] >> {'a': 3, 'b': 4} A note on indexing and iterating over a table's data: If you need to iterate over the data, make sure to use the python iterator syntax, so any data transformations can be applied efficiently. An example: .. code-block:: python # Some data transformations table.add_column('newcol', 'some value') # Efficient way to grab all the data (applying the data transformations only once) rows_list = [row for row in table] .. warning:: If you must index directly into a table's data, you can do so, but note that data transformations will be applied **each time** you do so. So this code will be very inefficient on a large table... .. code-block:: python # Inefficient way to grab all the data rows_list = [] for i in range(0, table.num_rows): # Data transformations will be applied each time through this loop! rows_list.append(table[i]) ==== PETL ==== The Parsons ``Table`` relies heavily on the `petl `_ Python package. You can always access the underlying petl table, ``parsons.Table.table``, which will allow you to perform any petl-supported ETL operations. Additionally, you can use the helper method, :py:meth:`~parsons.etl.etl.ETL.use_petl`, to conveniently perform the same operations on a parsons ``Table``. For example: .. code-block:: python import petl ... tbl = Table() tbl.table = petl.skipcomments(tbl.table, '#') or .. code-block:: python tbl = Table() tbl.use_petl('skipcomments', '#', update_table=True) ============ Lazy Loading ============ The Parsons ``Table`` makes use of "lazy" loading and "lazy" transformations. What this means is that it tries not to load and process your data until absolutely necessary. An example: .. code-block:: python # Specify where to load the data tbl = Table.from_csv('name_data.csv') # Specify data transformations tbl.add_column('full_name', lambda row: row['first_name'] + ' ' + row['last_name']) tbl.remove_column(['first_name', 'last_name']) # Save the table elsewhere # IMPORTANT - The CSV won't actually be loaded and transformed until this step, # since this is the first time it's actually needed. tbl.to_redshift('main.name_table') This "lazy" loading can be very convenient and performant. However, it can make issues hard to debug. Eg. if your data transformations are time-consuming, you won't actually notice that performance hit until you try to use the data, potentially much later in your code. So just be aware of this behavior. ******** Examples ******** =============== Basic Pipelines =============== .. code-block:: python # S3 to Civis s3 = S3() csv = s3.get_file('tmc-bucket', 'my_ids.csv') Table.from_csv(csv).to_civis('TMC','ids.my_ids') #VAN Activist Codes to a Dataframe van = VAN(db='MyVoters') van.activist_codes().to_dataframe() #VAN Events to an s3 bucket van = VAN(db='MyVoters') van.events().to_s3_csv('my-van-bucket','myevents.csv') ************* To & From API ************* .. autoclass:: parsons.etl.tofrom.ToFrom :inherited-members: ****************** Transformation API ****************** The following methods allow you to manipulate the Parsons table data. .. autoclass:: parsons.etl.etl.ETL :inherited-members: