Sketching the Blueprint of Your Data Infrastructure
Section Overview
Data infrastructure refers to the systems, tools, and processes that help collect, transform, clean, store, analyze, and share data. It is the backbone that enables you to measure your base-building and leadership development efforts. This section walks you through key infrastructure steps to consider as you set up your metrics.
When thinking about your data infrastructure, we encourage you to think about it as an apparatus. It’s a coordinated set of tools, practices, and relationships. It’s not just the database or CRM. It’s also the canvassers and organizers collecting and entering data, the training that helps organizers collect consistent data, and the documentation that keeps everyone aligned even as staff and volunteers come and go. As a result, you need to consider who is collecting the data and how easy it is for them to do so; which tools they’ll use; what systems make sense for storing and analyzing the data; and how insights get shared with organizers, leadership, funders, and other stakeholders.
The decisions that you make about your data infrastructure shape how well you can implement and learn from your base-building and leadership development metrics. They determine what information gets captured, how reliably it’s tracked over time, and how easily you can analyze and share it. Building the right infrastructure requires weighing tradeoffs that enable varying levels of scale, automation, capacity, and learning.
Ideally, your metrics should guide how you build your data infrastructure. In practice, that’s not always possible. Constraints like inflexible tools, gaps in available data, limited staff capacity, and volunteers’ discomfort with new technology (among many others) often shape what’s feasible. As a result, you likely will need to negotiate between what you want to measure and what your system can realistically support. Naming these tensions early helps you design systems and metrics that are realistic, intentional, and aligned with your organizational goals.
This section takes a bird’s eye view of data infrastructure. We don’t cover every technical decision. We don’t get deep into the weeds of data engineering like real-time syncing, orchestration,1 and API integrations.2 We also don’t tackle operational questions like data security, tool costs, and tool reliability.3 This section doesn’t provide answers to all of the considerations that we raise. Some decisions will depend on the resources and capacity available to your organization and your organizing needs.
Because many people across your organization will be touching the metrics data, we strongly recommend that you bring together cross-organizational stakeholders to develop these systems and processes. Include leadership, organizers, and data managers. This will result in a more thoughtful system that meets the needs of each group. Additionally, we recommend consulting other data engineers, data managers, and tool experts in our movement to help inform your work.
We break up this section into 4 key steps:
- Step 1: Identify what data to collect and who’s collecting it
- Step 2: Select appropriate CRMs and other tools
- Step 3: Figure out intermediary data tables
- Step 4: Confirm metrics and decide who they’re shared with
While we lay them out as steps, the process is rarely linear. When you sketch out the blueprint of your data infrastructure, expect to move back and forth between the steps, revisit earlier decisions, and jump between this section and your metrics as your thinking evolves. This fluidity is normal when building systems that reflect base-building and leadership development work. We framed these as steps for clarity, but don’t take the sequence as the definitive approach.
Finally, this section is primarily geared toward organizations that have access to a data warehouse.4 If your organization isn’t using one right now, you can still leverage the tools and resources below to plan how your data infrastructure can grow alongside your organizing work.
Get Started With Our Data Map Visualization Tool
Before we jump into the steps, we recommend using our Data Map Visualization Tool to help you get started on visualizing your data infrastructure. Open and make a copy of it. The visualization tool offers a high-level view of your data funnel — from data collection to creating dashboards. The tool is meant to help you think through key questions including:
- Who’s responsible for each aspect of the data process, including collecting data, managing data, analyzing data, and using data to inform decision-making
- What data’s being collected and how
- Where data’s being stored
- What are the outputs or metrics
- With whom are the different outputs or metrics shared and how are they used
- How your data benefits your organization (and team members’ individual work)
You can zoom in and out of the view of the tool above. If you want to make a copy and play with the tool, you can do that here.
The steps in the data map visualization tool correspond to the steps laid out in this section. We recommend that you fill out the visualization tool while reading the steps below. As you engage with the tool, involve your teammates (data managers, leadership, and organizers). They may flag items that you missed and vice versa.
We think you’ll get more out of this section while using the tool, but you can also proceed without it. You can still find helpful context and considerations in this section as you plan your data infrastructure.
Step 1: Identify What Data to Collect and Who’s Collecting It
You’ve defined the roles on your ladder of engagement and selected metrics that reflect the relationships your organization is building. In this first step of sketching your data infrastructure, you’ll identify the data that supports those metrics and who’s responsible for collecting that data. In the Data Map Visualization Tool, take a look at Step 1 on the map. You’ll see that the “Front-End Data Collection” column and the green boxes next to it come pre-filled with some ideas of data to collect and who collects it. These are just examples – they’re not meant to be prescriptive. Edit the list to make it work for your organization. Below are some considerations.
What data are you collecting?
Base your data on the roles you’ve defined and the metrics you’ve selected. You’ll want to collect data that reflects the “observed actions” you listed for each role in your ladder of engagement (see “Developing Metrics” section). For example, if you define a leader as “Someone who has completed leadership training and conducted a one-on-one,” you’ll need to collect data on whether a person completed each component of your leadership training program, as well as whether they’ve conducted a one-on-one. Make sure you include this data in the “Front-End Data Collection” column of your Data Infrastructure Map.
You’ll also want to identify your organization’s “evergreen data.” Evergreen data are basic, common pieces of data that base-building organizations collect every time they engage with members of their base. They include things like volunteer information (e.g., name, contact information, issue interests), event information (e.g., event name, organizer name(s), event type, estimated attendance), one-on-one information (e.g., contact’s name, contact information, organizer’s name, meeting notes), etc. In the Data Map Visualization Tool, much of the pre-filled data in the “Front-End Data Collection” column are examples of evergreen data. Start with that list of common evergreen data. Then, add any additional data that will help you do your work. Feel free to remove data that doesn’t serve your team.
Who’s collecting the data?
As you identify what data needs to be collected, be clear about who is responsible for collecting it, too. In many cases, organizers collect much — if not the majority — of the data, because they’re running the programs and engaging directly with the community. This means that the organizing work shapes the rest of your data infrastructure. From the tools the organizers use to the primary modes of contact with the community to the organizing structures, all of it affects what’s feasible to track immediately and over time. Talk to your organizers and explore ways to simplify the existing systems and implement new ones, if needed. Note who will be collecting each type of data in the green boxes in your Data Map Visualization Tool.
Tradeoffs
- Flexible data collection versus streamlined data collection: A tradeoff you may face is how much flexibility to allow in your data collection. Flexible approaches — like open text fields and custom tags — may help organizers draw out rich insights, especially during their 1:1s or leadership development conversations. However, flexibility can introduce complexity and inconsistency in your data, such as inconsistent data formats or tags, and require more time to clean. On the other hand, streamlined data collection can make data management and reporting faster, but that could come at the cost of nuance that drives the organizing work. Balance structure and flexibility.
Step 2: Select Appropriate CRMs and Other Tools
In this next step, you’ll assess how your tools might support or limit the implementation of your metrics. There’s no single tool that fits all organizing contexts. Most organizations assemble a mix of tools that fulfill their immediate needs, and build additional infrastructure to cover critical gaps.
If your organization already has a set of tools in place, take the time to evaluate the strengths and limitations of your current setup.
If you’re part of a newer and smaller organization, you’re likely starting with fewer tools. For data collection and management, platforms like Google Sheets or Airtable can serve as a lightweight database in the early stages. They can work well when your base is relatively small and workflows are simple. As your organization grows, you’ll likely need more advanced features, stronger infrastructure, and tools that can scale with your work. We’ve written this step primarily for organizations with more complex tech stacks, but even if you’re just getting started, we recommend following along. You can plan ahead so that your data infrastructure can scale with your operation.
To proceed with this step, go to Step 2a in the Data Map Visualization Tool to decide how you want the data to be collected (e.g., by paper, spreadsheet, CRM form). Drag those icons next to the appropriate boxes in the “Front End Data Collection” column. Afterward, use Step 2b to decide how you’ll store your data. If you’re moving data around, it’s possible that you’ll need multiple layers. For instance, data may be initially stored in your “CRM tool” and then moved to your “Data Warehouse” (read below for more details).
While you’re filling out Step 2a and 2b, think about the different types of available tools, how those tools meet organizing needs, and the data and tables that tools provide.
What type of tools do I need to consider?
Four categories of tools are particularly central to metric implementation. They are:
- Organizing and mobilization tools — These tools support direct outreach and coordinated action. They help organizers and volunteers connect with constituents through phone calls, texts, door-to-door canvassing, events, etc.
- Constituent Relationship Management systems (CRMs) — These tools help organizations manage and deepen their relationships with constituents, other groups, funders, and more. CRMs can be used to store contact information, logging conversations, track leadership development, and coordinate outreach.5
- Data warehouses — Organizations store and organize large amounts of data here. Data warehouses are where information from various tools are brought together, therefore unlocking deeper analyses and reporting.
- Sync tools — These tools help move data between platforms. They can help keep your CRM, mobilization tools, spreadsheets, and dashboards updated. They can automate data transfers, reduce duplication, and prevent errors.
In the rest of this step, we focus mainly on CRMs, with a light touch on organizing and mobilization tools and data warehouses. Your CRM is the central hub where organizers track relationships and coordinate action. It’s the tool organizers engage with most consistently, and the data collected and made available by your CRM heavily affects what you can measure. As a result, we believe getting the CRM right matters.
Do your CRM and tools meet your organizing needs?
Different organizations have different tool needs based on their organizing strategy, organizing context, and the makeup of their base. OPIN, for example, had very specific needs in a tool. They do most of their organizing in-person, their base is largely Spanish-speaking, and many members of their base don’t have email addresses or use email. OPIN needed a CRM that:
- Has a system for in-person event tracking
- Has a Spanish-language user interface
- Does not share user data with law enforcement
- Doesn’t require an email address as a unique ID.
Determine whether your current tools meet your organizing needs. Think through what your team needs out of a tool. Do your current tools meet the needs of your base? Do they work well for your organizing method(s)? Are they simple and easy to use and teach to others? Do the tools work well together and speak to each other? What are you not able to do with your current tools that you’d like to do? Create a list of what you need in a tool, starting with your top priorities.
Several of our cohort members chose Action Builder as their CRM. Data staff and organizers at People’s Action reported that Action Builder was “great,” “intuitive,” “accessible,” and “intentional” – particularly compared to a previous tool that “felt like driving your grandmother’s station wagon.” One person said “It feels like Action Builder is trying to work with organizers. It includes a lot that organizers need for data and usage. It’s mobile friendly.” OPIN considered several options, including using CRMs built for labor organizing and abandoning CRMs altogether for Google Sheets. They ultimately selected Action Builder, which by the end of the cohort session, had rolled out a new events feature and no longer requires email addresses as unique IDs.
If you’d like to explore different tool options, consider checking out Higher Ground Labs’ annual political tech landscape report. Each report provides a detailed map of tools and vendors that political organizations currently use.
What data and tables do the tools provide?
Your metrics rely on the data and tables that the tools make available to you. First, consider how you can access the data. What data is available via the user interface (also commonly known as GUI or front-end)? Does the tool provide other ways to access the data? If so, can you automate using their API or does it require a more manual process of bulk uploads and downloads? Can you set different permissions for data access based on users and their roles?
Second, consider what data is available. Many tools provide documentation for developers, including lists of the data tables and fields they expose. Review that information and compare it to the data that you identified in Step 1. Are you able to collect everything that you need? For data that’s not immediately available, can you collect it another way? Or can you substitute it with a reliable proxy? For data that’s absolutely unavailable, think about how it will affect your metrics.
Third, consider how the data is structured. Data structure affects the ease of cleaning and preparing the data, so that it’s ready for analysis and reporting. Is the data unstructured (e.g. open text or JSON)? Or is the data structured (e.g. a data table)? If it’s structured, how do the various data tables relate to each other? Will it take a lot of effort to join tables together (also see “Step 3: Figure out intermediary tables”)?
Finally, consider how data from your suite of tools will connect with each other. If you can move the data into a data warehouse, you’ll have more flexibility to join tables and generate insight gathered from multiple tools. Look into existing syncs or connectors in the ecosystem. For instance, Parsons is a useful Python package that contains a growing list of connectors and integrations to move data between various tools. For smaller organizations, using a data warehouse and managing syncs might not be feasible right away. It still could be useful to plan ahead. You may be able to avoid the costs of having to change and rework your systems down the line.
Tradeoffs
- Switching CRMs versus staying put: Migrating to a new CRM can be stressful. It may only be worth switching if you have major needs that are not being met and you’re able to carve out capacity for migration and onboarding. Before switching, first talk to CRM and tool companies. It’s possible that others want the features you need, too. OPIN struggled to find a CRM that met their needs, and spoke to several companies to explore options. Through those conversations, they learned that Action Builder was planning to roll out its new events feature, a key factor that helped seal the deal. If you decide to migrate CRMs, People’s Action has a few recommendations based on their experience transitioning to Action Builder during the cohort project. They suggest picking a clear owner to oversee any migration work, scope organizer needs, and plan training for the new system. If your organization, like People’s Action, has affiliates, you may need multiple staff to support their migration processes. Expect the process to cost money and likely take a substantial amount of time.
Step 3: Figure Out Intermediary Data Tables
Once you have decided on the data that you want to collect and have determined how you retrieve that data from your tool, you’ll need to figure out how to structure and manage it. We lay out some suggested data tables you may want to use in the “Created Data Tables” column of the Data Map Visualization Tool. In a future version of the map, we’ll fill out and link example schemas.
It may be especially useful to discuss this step with a data engineer. Depending on what your tools make available, some intermediary data tables may be challenging to assemble, automate, or maintain. Building them can be time-intensive and complex. Because these tables often serve as the bridge between the raw data and your metrics, any gaps or breakdowns can directly affect your ability to track your base-building and leadership development progress.
What’s the point of creating intermediary data tables?
As you think about these tables, consider three high-level goals for creating them. The tables are meant to:
- Unify engagement across tools and programs: Your base-building and leadership development metrics rely on your ability to create unified engagement profiles for each individual. If you’re using multiple tools or run multiple programs, you’ll need to join the data to create a holistic view of each person’s interactions within your organization. Without it, you won’t be able to build a complete picture of someone’s journey within your organization, identify deeper asks, and surface potential and current leaders.
- Manage duplicates and inconsistencies: Your data will inevitably contain duplicates. Individuals may exist across systems with different IDs, or even within the same system due to repeated entries. Intermediary tables help you de-duplicate records by matching based on Evergreen fields (like name, phone, email, or address).6
- Support flexible, scalable reporting: Intermediary tables are the layer that connects raw data to dashboards and reports. They are not the final reports themselves. Design your intermediary data tables in a way that they support multiple types of reporting needs. Peek back to the metrics you’ve identified and peek ahead to how you imagine the metrics to be shared. Consider how you expect the data needs to be filtered — perhaps by time period, programs, organizers, geography. Your intermediary tables can help you adapt your reports to different audiences and strategic questions without having to reshape the raw data from scratch each time.
For more information and a deeper dive into general data engineering principles, Viswa Challa (People’s Action) recommends Fundamentals of Data Engineering. It’s a useful resource even for non-engineers sketching their blueprint.
What tables should I consider?
We suggest creating multiple person-centered intermediary tables that help you organize information about people and their engagement over time. In some tables, each row represents a single person and each column lists information about each person. In others, each row represents something a person did, such as attending an event, completing a survey, or leading an action.
Here are some core tables to consider:
- People: One row per person, with fields for contact information, demographics, voter ID, issues of interest, etc.
- Actions: One row per action taken by the person. You’ll likely want to start with separate tables for different types of engagement (e.g. events, donations) and then strategically join them afterward. If you’ve got a ladder of engagement, make sure your action tables capture the common actions that you’ve listed. For your events table, make sure that you include and clean up any tags that you can use to help identify the various common actions.
- Outreach: One row per outreach in order to capture outreach and conversations. You can keep tabs of who’s contacting whom, key themes or tags, and even organizer’s notes.
Here are some additional tables:
- Survey responses: One row per survey response per person, if you’re taking surveys of your membership or broader community. You’ll also need a complementary table that holds the survey questions.
- Ladder rungs: One row per person. You would derive this from your core tables based on the roles and metrics that you’ve already developed.
Each person-centered table should have a unique ID column, and each person’s unique ID should be carried across all person-centered tables so that you can connect them.
As you build your person-centered tables, you’ll likely also need a set of lookup or reference tables to provide context. One important example is an event-level table, which holds key details about each event—such as date, location, event type, and program affiliation. At Fair Count, organizers track events across various programs by using a Google Form. The responses feed into a table that allows them to sort events and categorize whether each one reflects high-level or low-level engagement. This structure makes it easier to analyze engagement depth over time.
Also, think through any external data tables you might be using, such as the voter file or census data. Add them to the “External Datasets” column of the Data Map Visualization Tool.
Tradeoffs
- Front-loading versus layering as you go: You may have to decide how much of your intermediary tables to build upfront so you can start collecting data right away, versus developing it gradually over time. Front-loading the work can set you up to track organizing efforts over an extended period of time. However, be wary of letting perfection be the enemy of the good. You might end up slowing down implementation and then struggle to build buy-in. Identify what you can layer in over time and include those layers in your design now.
Putting It All Together
The steps above are not meant to be linear. As you sketch your data infrastructure blueprint, take your time to think through how each decision in one area may affect others and the metrics that you’re working toward.
Just as important, collaborate with others. Collaborate across teams to ensure your infrastructure reflects what they can realistically contribute and what they need to get out of it. We cannot stress this enough. Without alignment and buy-in (also see section, “Building Buy-in”), even the most well-designed data infrastructure will fail when people don’t know how to engage with it or don’t see its value.
Once you’re done sketching, you’re ready to put your plan into action. As you do, remember two essential factors that can make or break your data infrastructure work: documentation and onboarding, and staying adaptable.
Documentation and onboarding
Documentation and onboarding are absolutely crucial. Without them, your carefully created systems may be used inconsistently, incorrectly, or not at all. Documentation and onboarding give people the opportunity to acclimate to the new systems, ask questions, and even provide productive feedback.
Create documentation before you begin your onboarding process. Documentation provides people with something concrete to refer back to, especially when lessons learned during training aren’t relevant until weeks or months later. You’ll want your documentation to include the roles, shared norms, and practices around data that your team developed, as well as who to ask if you have questions. You can refer back to your completed Data Visualization Map to find some of this information.
Documentation should include:
- Who’s responsible for each aspect of the data process, including collecting data, managing data, analyzing data, and using data to inform decision-making
- What data’s being collected and how
- Where data’s being stored
- What are the metrics or other outputs
- With whom are the different outputs or metrics shared
Recognize that it’ll take time for your team to learn and feel comfortable with your new tools, systems, and practices. Bring awareness and graciousness to the onboarding process. Make time to regularly check in on how implementation is going and answer questions. Regular users will often encounter unanticipated situations that may require some iteration; people also just forget or miss things. Amity (ISAIAH) suggested dropping all documentation in a shared Google Drive, where folks can refer back when needed. Set your team up to talk to each other about this work, as well. Often, your best first line of defense is peer-to-peer support!
Stay adaptable
As you and your organization build and pilot your data infrastructure, your collective understanding of your needs will deepen. Additionally, new challenges or limitations may arise. For instance, all of the organizations in our learning cohort felt capacity-strapped at the height of a presidential election year. They had to strategically decide which work they needed to proceed with or postpone.
Build in moments to revisit what’s working and identify the changes that need to be made. Create feedback loops between organizers, data managers, and leadership. The more flexible and responsive your infrastructure is, the more it can grow alongside your organizing work.
Orchestration refers to the automated scheduling and management of data workflows across tools and software. It ensures that data pipelines run in the correct order at the right time, and monitors and handles failures if and when they occur.↩︎
API stands for “Application Programming Interface.” It is a connection between computers or software programs. For data work, APIs serve as the bridge between systems and enables data to be moved around and processes to be automated.↩︎
If your organization is a part of The Movement Cooperative, feel free to raise these questions in our various communications channels. If your organization isn’t part of TMC, we can’t promise that we’ll be able to support you directly. However, we might be able to point you in the right direction.↩︎
For TMC members, TMC supports syncing your data into our data warehouse, Google BigQuery. Reach out to your contact if you’re looking for support.↩︎
CRMs vary in progressive movement work. You may even need to work with multiple CRMs. In this step, we focus on CRMs that act as a centralizing hub for base-building and leadership development work.↩︎
Data engineering often calls this process “Identity (ID) Resolution.” This is a challenging problem to solve. At the timing of writing, ID resolution is in our product roadmap.↩︎