Catalist
Overview
The CatalistMatch class allows you to interact with the Catalist M Tool (match) API. Users of this Parsons integration can use the Parsons table format to send input files to the M Tool and receive back a matched version of that table.
Note
- Authentication
In order to use this class you must be provided with an OAuth Client ID and Client Secret from catalist, as well as SFTP credentials. You will also need to have Catalist whitelist the IP address you are using to access the M Tool.
Quickstart
To instantiate the CatalistMatch class, you must provide your client_id
, client_secret
, sftp_username
and sftp_password
values as arguments:
import os
from parsons import CatalistMatch
match = CatalistMatch(
client_id=os.environ['CATALIST_CLIENT_ID'],
client_secret=os.environ['CATALIST_CLIENT_SECRET'],
sftp_username=os.environ['CATALIST_SFTP_USERNAME'],
sftp_password=os.environ['CATALIST_SFTP_PASSWORD']
)
You can then load a CSV as a Parsons table and submit it for matching, then save the resulting matched Parsons table as a CSV.
source_table = Table.from_csv(source_filepath)
result_table = match.match(source_table)
result_table.to_csv(result_filepath)
API
- class parsons.CatalistMatch(client_id: str, client_secret: str, sftp_username: str, sftp_password: str, client_audience: str | None = None)[source]
Connector for working with the Catalist Match API.
This API allows a trusted third party to submit new files for processing, and/or reprocess existing files. It also allows retrieval of processing status. Initial setup of template(s) via the M Tool UI will be required.
The Catalist Match tool requires OAuth2.0 client credentials for the API as well as credentials for accessing the Catalist sftp bucket. Each Catalist client is given their own bucket alias named after a tree species, used for constructing the filepath within the sftp bucket.
Accessing the Catalist sftp bucket and Match API both require the source IP address to be explicitly white-listed by Catalist.
Example usage:
` tbl = Table.from_csv(...) client = CatalistMatch(...) match_result = client.match(tbl) `
Note that matching can take from 10 minutes up to 6 hours or longer to complete, so you may want to think strategically about how to await completion without straining your compute resources on idling.
To separate submitting the job and fetching the result:
` tbl = Table.from_csv(...) client = CatalistMatch(...) response = client.upload(tbl) match_result = client.await_completion(response["id"]) `
- load_table_to_sftp(table: Table, input_subfolder: str | None = None) str [source]
Load table to Catalist sftp bucket as gzipped CSV for matching.
If input_subfolder is specific, the file will be uploaded to a subfolder of the myUploads directory in the SFTP server.
- Args:
- table: Table
Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.
- input_subfolder: str
Optional. If specified, the file will be uploaded to a subfolder of the myUploads directory in the SFTP server.
- match(table: Table, export: bool = False, description: str | None = None, export_filename_suffix: str | None = None, input_subfolder: str | None = None, copy_to_sandbox: bool = False, static_values: Dict[str, str | int] | None = None) Table [source]
Load table to the Catalist Match API, returns matched table.
This method blocks until the match completes, which can take from 10 minutes to 6 hours or more depending on concurrent traffic.
- Args:
- table: Table
Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.
- export: bool
Defaults to False
- description: str
Optional. Description for the match job.
- export_filename_suffix: str
Optional. Adds a suffix to the result filename in the SFTP server.
- input_subfolder: str
Optional. Adds a prefix to the filepath of the uploaded input file in the SFTP server.
- copy_to_sandbox: bool
Defaults to False.
- static_values: dict
Optional. Any included values are mapped to every row of the input table.
- upload(table: Table, template_id: str = '48827', export: bool = False, description: str | None = None, export_filename_suffix: str | None = None, input_subfolder: str | None = None, copy_to_sandbox: bool = False, static_values: Dict[str, str | int] | None = None) dict [source]
Load table to the Catalist Match API, returns response with job metadata.
- Args:
- table: Table
Parsons Table for matching. “first_name” and “last_name” columns are required. Optional columns for matching: last_name, name_suffix, addr1, addr2, city, state, zip, phone, email, gender_tomatch, dob, dob_year, matchbackid.
- template_id: str
Defaults to 48827, currently the only available template for working with the Match API.
- export: bool
Defaults to False
- description: str
Optional. Description for the match job.
- export_filename_suffix: str
Optional. Adds a suffix to the result filename in the SFTP server.
- input_subfolder: str
Optional. Adds a prefix to the filepath of the uploaded input file in the SFTP server.
- copy_to_sandbox: bool
Defaults to False.
- static_values: dict
Optional. Any included values are mapped to every row of the input table.
- action(file_ids: str | List[str], match: bool = False, export: bool = False, export_filename_suffix: str | None = None, copy_to_sandbox: bool = False) List[dict] [source]
Perform actions on existing files.
All files must be in Finished status (if the action requested is publish), and must mapped against the same template. The request will return as soon as the action has been queued.
- Args:
- file_ids: str or List[str]
one or more file_ids (found in the id key of responses from the upload() or status() methods)
- match: bool
Optional. Defaults to False. If True, will initiate matching.
- export: bool
Optional. Defaults to False. If True, will initiate export.
- export_filename_suffix: str
Optional. If included, adds a suffix to the filepath of the exported file in the SFTP server.
- copy_to_sandbox: bool
Defaults to False.
- await_completion(id: str, wait: int = 30) Table [source]
Await completion of a match job. Return matches when ready.
This method will poll the status of a match job on a timer until the job is complete. By default, polls once every 30 seconds.
Note that match job completion can take from 10 minutes up to 6 hours or more depending on concurrent traffic. Consider your strategy for polling for completion.