From Weekend Experiment to Revenue: Building a Data Aggregation Platform
Most data aggregation projects start with a simple observation: the information exists, but it's scattered across dozens of sources in incompatible formats. The question is whether centralizing it creates enough value to sustain a business. Here's how we tested that with a tournament search platform.
The problem
Finding youth hockey tournaments is annoying in a very specific way. The information exists — every organizer posts their events somewhere. But "somewhere" means fifty different websites, each with their own date format, age division terminology, and buried registration links. Parents and coaches end up with a spreadsheet and a Saturday morning they'll never get back.
How we solved it
We figured someone should scrape all of it into one place. TourneyHunter started as a weekend experiment. The first version was running by the end of a Saturday morning. Fifty-plus custom scrapers, each purpose-built for a specific organizer's site, pulling data three times a day into one searchable directory.
Some scrapers use Cheerio for static HTML, others Puppeteer for JavaScript-rendered sites. They run on a single $12/month VPS via crontab with auto-disable after five consecutive failures and a weekly recheck to bring sources back online. The goal was to make discovery and data quality checking nearly fully automated.
Scraping is the easy part. Making the data consistent is the real work. Names come through in ALL CAPS from one source, Title Case from another. The worst bug: one organizer runs tournaments across 14 cities, but the scraper tagged every event as the same city because the headquarters appeared in the body text before the actual location. By the time the data was clean, we'd fixed 124 tournament names, 29 city records, and 10 duplicates. Invisible to users, critical when someone is paying to promote their listing.
Two revenue streams for different audiences
Parents and coaches get advanced search filters and alerts for free — enough to learn the value — then hit a paywall. The first paying customer came through organically. No outreach. No ads. They used it, saw the value, and paid.
The organizer side is a one-time $99 promotional placement, active until the event date passes. Simple pricing after three iterations taught us not to over-engineer the business model before having customers.
What the data told us
Google Analytics revealed something we didn't expect. The top page wasn't the tournament search. It was the Skill Levels Guide — a page explaining what AAA, AA, and Tier 2 actually mean. Most visitors weren't ready to search for tournaments yet. They were trying to figure out what level their kid plays at.
That one data point changed how we think about the site. The SEO strategy shifted from targeting "hockey tournaments near me" to answering the questions people ask before they're ready to search. The content creates the traffic, and the traffic discovers the product.
How we monitor quality
A daily QA job runs 56 search queries and cross-references results against the database to catch what no scraper covers. A separate agent does its own headless browser searches and emails the results for a second pass. In data aggregation, nothing is worse than a listing the system missed. We'd rather have two systems checking than one system we trust too much.
The business dashboard answers one question: "what should I focus on today?" MRR by plan type, subscriber growth, coverage gaps flagged in red. Every listing has organizer contact info embedded in the scraped data — ready for outreach campaigns when the time is right.
Where it stands
Nearly 1,000 tournaments across 44 states and provinces. 187 pages indexed by Google in five weeks. 100+ weekly visitors. A paying customer. An automated blog. An outbound pipeline to organizers. All running on a $12/month server with limited maintenance.
The architecture is repeatable — it has nothing to do with hockey. Any industry where information is scattered across dozens of small websites that nobody has centralized is a candidate for the same system. We've already heard from people in other sports and non-sports verticals with the same problem.
The lesson: build the thing, make it work, put it where people can find it. The first customer will tell you more about your product than six months of planning ever would.