Field-kit: Open source tools for faster geospatial data prep

March 16, 2026

Most geospatial analysis projects in global health face the same two challenges at the start. First, you need to gather data like administrative boundaries or population rasters from a lot of different places before you can do anything useful. Second, you’ll usually discover that two sources describing the same geography rarely agree on how to name things.

These problems don't get much attention, but they eat up a good chunk of time on every project. We've run into both enough times that we decided to build tools to deal with them. We recently cleaned those up and published them as a field-kit, a small open source repository on our GitHub. 

This article covers what each tool does, why we built them, and who can use them. 

Gathering country data before you need it

When you start a new country analysis, you typically need several a bunch of foundational  pieces of data in place before you can get anywhere:

  • Administrative boundaries
  • Population rasters
  • Road networks
  • Building footprints

Those come from many different sources, each with its own format and download method. If you're heading into the field with poor internet, you want all of it on your machine before you go. Even from the office, manually pulling everything together can eat up valuable time.

Provision: Automating the heavy setup

The Provision tool automates that process. You give it a country code and an admin level, and it goes out and downloads everything into a folder. That’s admin boundaries from geoBoundaries, population rasters from WorldPop, and a handful of other standard sources.

We built it by taking the internal process we use whenever we load a new country into the Crosscut app and making it available as a standalone tool. It's not connected to what's currently in the app, so it works for any country in the world.

Provision works by typing a short command into your terminal. You tell it the country and admin level you need and it handles the rest. It's a developer tool, but more approachable than it sounds. The README walks through everything you need to get started.

➲ What Provision downloads

Data Source Format
Admin boundaries GeoBoundaries GeoJSON
Road networks OpenStreetMap (Geofabrik) Shapefile
AI-detected roads MapWithAI (RapidEditor) Shapefile
Population rasters WorldPop, Meta HRSL, Kontur GeoTIFF, VRT, CSV
Building footprints Overture Maps, OSM Parquet, CSV

Match-bot: Reconciling place names 

Once you have your data, analysts will almost always hit the same second problem. You have a spreadsheet of admin units and a GIS shapefile that are supposed to describe the same places, but the names don't match. For a case or two you can manage, but with thousands of data points it ends up turning into its own job. 

One source might name a facility “CS Gaspard Kamara,” while another names the same facility “Centre de Santé Gaspard Kamara.” These are the same place, the people who recorded the data just followed slightly different naming conventions. Merging those two data sets on a simple name match alone will fail. You usually get more mismatches as you drill down to lower admin levels, where there’s a lot of variation in how anything is spelled (or what it's called).

Match-bot handles this part with fuzzy matching. Rather than requiring an exact name match, it finds pairs that are close enough to be the same place, based on how many character-level edits separate the two strings. Match-bot packages those techniques into a complete set of Python scripts and configuration files you can set up once and reuse across projects.

How the matching process works

Match-bot uses a list of place names and a GIS shapefile. It runs the matching in multiple passes, starting with exact matches and locking those in first. It then relaxes the requirement and looks for close matches using Levenshtein distance, which is just a measure of how many character-level changes separate two strings. 

➲ Why the threshold matters

Safe Match (Distance: 2)

Springfield → Springfeild

Risky Match (Distance: 4)

Riverside → Lakeside

Set thresholds carefully to allow small, safe corrections without pulling in unrelated places.

Matching also always starts at the highest administrative level and works down. Skipping straight to sub-districts risks matching two places that share a name but sit in completely different regions (like Springfield, Massachusetts vs Springfield, Missouri). 

Working top-down means each match is checked against what's already been confirmed above it. And before locking anything in, the tool looks at all the candidates in a group together rather than grabbing the first one that looks close enough. This avoids accidentally closing off a better match that's sitting right next to it.

How loose or strict the fuzzy pass gets is a configuration setting you control. You’ll find the correct threshold varies, based mostly on how consistent local naming conventions are. Some are more consistent or standardized than others. 

Time savings: From running 30 scripts to one workflow

A year ago, I was writing a new Python script for every country we worked with. The data was different enough that I couldn't reuse much from one script to the next. I had between 30 and 40 scripts that were difficult to maintain and nearly impossible to hand off to anyone else. Finding ways to offload this grunt work has been a focus of our professional development sessions.

Match-bot replaced all of that with one set of scripts and a configuration file that handles whatever is specific to a given project. Instead of baking country-specific logic into each script, the scripts stay consistent and the config changes per project. 

A few things followed from that:

  • Matching exercises went from "check in with me next week" to two or three hours
  • Time savings on a full data cleaning project can reach as high as 75%
  • Transferability so you set it up once and hand it off without much explanation

You can run Match-bot from the terminal, but it also ships with Claude Code skills that let you run the pipeline in a chat. Commands like "run the matches" or "suggest manual matches" trigger the underlying scripts on your behalf and report back with statistics and suggestions.

➲ Why no AI chat option for Provision?

Unlike match-bot, Provision just runs and finishes. You're not making judgment calls or iterating, so the terminal command is all you need and it's pretty straightforward once you're set up.

Built for global health planning, useful beyond it

We built field-kit for the kind of pre-processing work that comes up constantly in health campaign planning. There’s a lot of gathering public data for a new country, reconciling facility lists against official boundaries, then preparing inputs for microplanning exercises. That's the context these tools were designed around and where they've been tested.

But the underlying problems aren't specific to global health. Any time you're starting a geospatial project that needs standardized country data pulled quickly, Provision saves the setup time. Any time you're trying to join two data sets that describe the same underlying entities but use different naming conventions, match-bot applies. 

If you're working through similar challenges or want to talk through how these fit into a broader workflow, feel free to reach out through Advisory Services. Both tools are open source and available at github.com/crosscutio/field-kit.

Related Posts

News

Smappen vs. The Crosscut App: Which Catchment Mapping Tool Is Right for You

Smappen and the Crosscut App both run in a browser and have free tiers. This comparison covers where each tool works, and what is genuinely free versus what requires a paid plan.
Crosscut
March 9, 2026
App Development

All Sub-Saharan African Countries Now Available in the Crosscut App

We are excited to share that All Sub-Saharan African countries are now available with their latest data in the Crosscut App.
Crosscut
March 3, 2026
Case Studies

Building health catchments from geocoded data in Liberia

Catchment area mapping for their upcoming mass drug administration campaign, which we used to pilot the ESPEN Geospatial Microplanner across Maryland, Grand Cape Mount, and Montserrado counties.
James McKinnon
February 23, 2026

Let's get started

The Crosscut App is and always will be free to use. It takes less than a minute to register and get started.

© 2024 Crosscut. All rights reserved.
Crosscut
6707 4th St NW
Washington DC 20012
USA
Social