October 13, 2025

Starbridge Engineering Labs: Building a Contact Enrichment System with 98% Accuracy 

The challenge of extracting verified contacts from public sector data
Oleksandr Arzamastsev
Backend Engineer @ Starbridge

1. The Technical Challenge

Unlike most industries, the public sector has no reliable LinkedIn equivalent.

You can’t just pull data from a centralized professional network or rely on enrichment vendors. Government and education buyers live in their own ecosystem: staff directories, board minutes, PDF reports, and legacy agency websites.

If you want accurate contact data, you have to go to the source of truth: the individual agency websites themselves.

That means building a crawler that can navigate hundreds of thousands of unique sites. Each structured differently, many running on old JavaScript frameworks that are a barrier to effective modern LLM-based search. Some pages render data only after client-side execution. Others use outdated CMS systems with inconsistent markup or hidden iframes.

And because every city, county, school, and district manages its own site, there’s no universal format. A title might appear in a table cell on one page, a PDF roster on another, or an image with embedded text somewhere else.

So instead of a single API or data feed, we’re dealing with hundreds of thousands of agency sites, each with its own rules. That’s why horizontal contact enrichment providers produce so many hallucinations. They’re guessing based on partial signals rather than verified source connections.

To benchmark traditional methods of contact collection for public sector data, we processed 1,200 contact searches using a traditional approach.
- 919 were valid
- 317 were invalid (a 25% failure rate.)

That 25% represents how difficult it is to accurately extract public sector contact data.

2. How We Approached the Solution

To tackle this, we split the workflow into two discrete systems:

The first system is our discovery engine, designed specifically for government and education data. Instead of pulling from LinkedIn or commercial databases, it goes directly to the source: agency directories, board minutes, PDFs, and verified .gov and .edu domains. Each record is parsed and tagged with metadata on where it came from, when it was last updated, and how confidently it maps to an individual.

This isn’t as simple as crawling a website. Every agency has its own structure. Some run onJavaScript frameworks that don’t expose static HTML, others hide rosters inside PDFs or inline scripts. To handle that complexity, we built adaptive scrapers that learn page layouts, combined them with LLM extraction to interpret semi-structured data, and layered on ranking logic to prioritize the most institutionally relevant matches. The goal: ensure that every potential contact comes with a verifiable trail before it ever reaches a human or downstream model.

The second step is the validator. It acts like a fact-checker, cross-referencing each candidate against contextual evidence to confirm the person actually works at the target agency and that the mention isn’t stale or unrelated. For example, in one test, the model surfaced “Lashonda Jackson” as affiliated with Bellaire High School, but the underlying pages either 404’d or lacked any reference to the school. The validator caught the discrepancy and discarded it.

This layered approach transformed accuracy. By separating finding from proving, we cut false positives dramatically and brought contact accuracy to 98%.

3. The Result: 98% Contact Accuracy

Our validation system now achieves 98% contact accuracy across thousands of fragmented government websites, filtering out bad data while keeping only verified, up-to-date contacts.

For our users, that accuracy translates into confidence. Every contact surfaced in Starbridge is verified, contextual, and ready to act on.

And for us, it’s proof that building the right infrastructure for public-sector data isn’t about shortcuts or “data enrichment.” It’s about respecting the complexity of the ecosystem and engineering your way through it, one agency at a time.

Ready to book
more pipeline?

See your top accounts that are ready-to-buy today