Catalist

Overview

The CatalistMatch class allows you to interact with the Catalist M Tool (match) API. Users of this Parsons integration can use the Parsons table format to send input files to the M Tool and receive back a matched version of that table.

Note

Authentication

In order to use this class you must be provided with an OAuth Client ID and Client Secret from catalist, as well as SFTP credentials. You will also need to have Catalist whitelist the IP address you are using to access the M Tool.

Quickstart

To instantiate the CatalistMatch class, you must provide your client_id, client_secret, sftp_username and sftp_password values as arguments:

import os
from parsons import CatalistMatch

match = CatalistMatch(
  client_id=os.environ['CATALIST_CLIENT_ID'],
  client_secret=os.environ['CATALIST_CLIENT_SECRET'],
  sftp_username=os.environ['CATALIST_SFTP_USERNAME'],
  sftp_password=os.environ['CATALIST_SFTP_PASSWORD']
)

You can then load a CSV as a Parsons table and submit it for matching, then save the resulting matched Parsons table as a CSV.

source_table = Table.from_csv(source_filepath)
result_table = match.match(source_table)
result_table.to_csv(result_filepath)

API

class parsons.CatalistMatch(client_id: str, client_secret: str, sftp_username: str, sftp_password: str, client_audience: Optional[str] = None)[source]

Connector for working with the Catalist Match API.

This API allows a trusted third party to submit new files for processing, and/or reprocess existing files. It also allows retrieval of processing status. Initial setup of template(s) via the M Tool UI will be required.

The Catalist Match tool requires OAuth2.0 client credentials for the API as well as credentials for accessing the Catalist sftp bucket. Each Catalist client is given their own bucket alias named after a tree species, used for constructing the filepath within the sftp bucket.

Accessing the Catalist sftp bucket and Match API both require the source IP address to be explicitly white-listed by Catalist.

Example usage: ` tbl = Table.from_csv(...) client = CatalistMatch(...) match_result = client.match(tbl) `

Note that matching can take from 10 minutes up to 6 hours or longer to complete, so you may want to think strategically about how to await completion without straining your compute resources on idling.

To separate submitting the job and fetching the result: ` tbl = Table.from_csv(...) client = CatalistMatch(...) response = client.upload(tbl) match_result = client.await_completion(response["id"]) `

load_table_to_sftp(table: Table, input_subfolder: Optional[str] = None) str[source]

Load table to Catalist sftp bucket as gzipped CSV for matching.

If input_subfolder is specific, the file will be uploaded to a subfolder of the myUploads directory in the SFTP server.

Args:
table: Table

Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.

input_subfolder: str

Optional. If specified, the file will be uploaded to a subfolder of the myUploads directory in the SFTP server.

match(table: Table, export: bool = False, description: Optional[str] = None, export_filename_suffix: Optional[str] = None, input_subfolder: Optional[str] = None, copy_to_sandbox: bool = False, static_values: Optional[Dict[str, Union[str, int]]] = None) Table[source]

Load table to the Catalist Match API, returns matched table.

This method blocks until the match completes, which can take from 10 minutes to 6 hours or more depending on concurrent traffic.

Args:
table: Table

Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.

export: bool

Defaults to False

description: str

Optional. Description for the match job.

export_filename_suffix: str

Optional. Adds a suffix to the result filename in the SFTP server.

input_subfolder: str

Optional. Adds a prefix to the filepath of the uploaded input file in the SFTP server.

copy_to_sandbox: bool

Defaults to False.

static_values: dict

Optional. Any included values are mapped to every row of the input table.

upload(table: Table, template_id: str = '48827', export: bool = False, description: Optional[str] = None, export_filename_suffix: Optional[str] = None, input_subfolder: Optional[str] = None, copy_to_sandbox: bool = False, static_values: Optional[Dict[str, Union[str, int]]] = None) dict[source]

Load table to the Catalist Match API, returns response with job metadata.

Args:
table: Table

Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.

template_id: str

Defaults to 48827, currently the only available template for working with the Match API.

export: bool

Defaults to False

description: str

Optional. Description for the match job.

export_filename_suffix: str

Optional. Adds a suffix to the result filename in the SFTP server.

input_subfolder: str

Optional. Adds a prefix to the filepath of the uploaded input file in the SFTP server.

copy_to_sandbox: bool

Defaults to False.

static_values: dict

Optional. Any included values are mapped to every row of the input table.

action(file_ids: Union[str, List[str]], match: bool = False, export: bool = False, export_filename_suffix: Optional[str] = None, copy_to_sandbox: bool = False) List[dict][source]

Perform actions on existing files.

All files must be in Finished status (if the action requested is publish), and must mapped against the same template. The request will return as soon as the action has been queued.

Args:
file_ids: str or List[str]

one or more file_ids (found in the id key of responses from the upload() or status() methods)

match: bool

Optional. Defaults to False. If True, will initiate matching.

export: bool

Optional. Defaults to False. If True, will initiate export.

export_filename_suffix: str

Optional. If included, adds a suffix to the filepath of the exported file in the SFTP server.

copy_to_sandbox: bool

Defaults to False.

status(id: str) dict[source]

Check status of a match job.

await_completion(id: str, wait: int = 30) Table[source]

Await completion of a match job. Return matches when ready.

This method will poll the status of a match job on a timer until the job is complete. By default, polls once every 30 seconds.

Note that match job completion can take from 10 minutes up to 6 hours or more depending on concurrent traffic. Consider your strategy for polling for completion.

load_matches(id: str) Table[source]

Take a completed job ID, download and open the match file as a Table.

Result will be a Table with all the original columns along with columns ‘DWID’, ‘CONFIDENCE’, ‘ZIP9’, and ‘STATE’. The original column headers will be prepended with ‘COL#-‘.

validate_table(table: Table, template_id: str = '48827') None[source]

Validate table structure and contents.