← Back to work
Data Aggregation

TourneyHunter

Aggregating scattered tournament data into one searchable platform.

The Problem

If you're a hockey player, parent, or coach looking for tournaments, you're in for a scavenger hunt. Tournament information is spread across dozens of individual organization websites, each with different formats, different update schedules, and different levels of detail. Some post PDFs. Some bury dates in paragraph text. Some don't update at all until two weeks before the event.

There was no single place to search, filter, and compare tournaments across organizations, age groups, skill levels, and locations. People relied on word of mouth, Facebook groups, and bookmarking a dozen sites they'd check manually.

The Solution

We built a fully automated scraping pipeline with 50+ custom adapters — one for each tournament organization's website. Each adapter understands the specific structure and quirks of its source, extracting dates, locations, age groups, skill levels, and registration details.

The scrapers run daily on a cron schedule. Incoming data gets normalized into a consistent format, deduplicated, and loaded into a searchable database. The frontend lets users filter by sport, location, date range, age group, and skill level — the search experience that should have existed all along.

How We Thought About It

The temptation with a project like this is to build one generic scraper and point AI at it. "Just throw GPT at the HTML and extract the data." We tried that early on. It works about 70% of the time, which sounds good until you realize that 30% failure rate means hundreds of tournaments with wrong dates, missing locations, or garbled age groups.

Instead, we built individual adapters for each source. More work upfront, but each one is testable, debuggable, and reliable. When a tournament org redesigns their site, we update one adapter — not retrain a prompt and hope for the best.

We use AI where it actually helps: normalizing messy age group labels ("U14," "14U," "Bantam," "14 and under" all mean the same thing) and classifying skill levels from inconsistent descriptions. But the core extraction is deterministic code we can trust.

The Growth Engine

Building the product was half the job. Getting it found was the other half, and we treated SEO as a system to build, not a checklist to follow.

Every tournament, location, age group, and skill level generates its own indexable page. The site architecture was designed from day one so that each scraped data point creates a long-tail search opportunity. Someone Googling "U14 AA hockey tournaments near Boston" lands on a page that exists because the scraper found a tournament matching those criteria. The data creates the SEO.

On top of that, we built an automated blog powered by the Anthropic API. The system generates topical content — guides on skill levels, how to pick the right tournament, what to expect at different age groups — and publishes it on a schedule. Not generic AI slop. Each post is grounded in the actual data the scrapers are collecting, so the content is specific and useful.

The result: the site went from 1 indexed page to 187 in five weeks. No paid ads, no link building, no manual outreach. Just good architecture and content that answers the questions people are actually searching for.

Automated Outreach

The scrapers don't just collect tournament data — they also extract organizer contact information. That data feeds directly into an automated email cadence via Kit.com's API. Tournament organizers get reached out to about promoting their events on the platform, without anyone manually building a list or sending an email.

The whole pipeline is hands-off: scraper finds a new tournament, extracts the organizer's info, adds them to the appropriate email sequence, and the cadence runs itself. Lead generation built directly on top of the data the system is already collecting.

The Pattern

TourneyHunter was built for hockey tournaments, but the system underneath it has nothing to do with hockey. It's a framework for aggregating fragmented event data from any industry where information is scattered across dozens of small organization websites that nobody has bothered to centralize.

We've heard from people in lacrosse, soccer, baseball, wrestling, and firefighter competitions — all dealing with the same problem. The adapters change. The architecture doesn't. If there's a niche where people are manually checking ten websites to find what they're looking for, this system can solve it.

The Result

  • 50+ automated scrapers pulling 1,000+ tournaments daily with no manual intervention
  • 1 to 187 Google-indexed pages in five weeks, purely organic
  • Automated blog generating SEO content from real tournament data
  • Automated outreach pipeline from scraper to email cadence
  • Revenue from promoted tournament listings
  • First paying customer within weeks of launching monetization
  • Repeatable architecture applicable to any fragmented event market