Real-Time Sports Data at Scale: Lessons from the Biathlon World Cup

The upstream problem: a polling feed from the 2000s

The International Biathlon Union provides an official data feed to licensed data consumers. When we started work on biathlonworld.com, that feed was HTTP polling — you hit an endpoint, you get XML or JSON, you wait a few seconds, you hit it again. The IBU feed latency was variable but generally in the 2–4 second range from the event happening on course to the data appearing in the feed.

That's the upstream reality. You don't get to change it. The IBU has their timing infrastructure, their transponders on the athletes, their scoring system at venue. We were consumers of that pipeline, not participants in it. So our architectural options were constrained from the start: we could optimize the ingest-to-frontend path, but we couldn't eliminate the 2–4 second upstream delay.

The traffic pattern, on the other hand, was entirely our problem. A biathlon World Cup race has roughly 50,000–100,000 live viewers on a typical day. Major events — World Championships, the mass start finals — push into the 400,000–500,000 range. And crucially, those viewers don't arrive at a constant rate. Race start is a spike: within 60 seconds of race start, you can see 10x the idle traffic hit the platform. If your architecture is synchronous and polling-based all the way to the browser, that spike hits your database directly. We couldn't let that happen.

Architecture: ingest → normalize → publish → subscribe

The pipeline we settled on had four stages, each decoupled from the others.

The ingest service was a small Node.js process that polled the IBU feed on a 3-second interval. It parsed the response, compared it to the last-known state stored in Redis, and emitted change events for anything that had changed. Not the full state dump — just the delta. This was important for the downstream fan-out: if 20 athletes' positions changed in a 3-second window, we emitted 20 change events, not one blob containing 50 athletes' full state.

The normalization layer sat between ingest and publish. Its job was to translate IBU's internal schema into our own event model. The IBU feed uses their own internal athlete IDs, timing codes, and status vocabulary. We maintained a mapping table that translated these into our athlete model. This normalization layer was — and I'll admit this up front — the part of the architecture most likely to cause problems, and it did. More on that shortly.

The publish layer pushed normalized events to PubNub. Each event went to one or more channels, depending on its type. The frontend subscribed to PubNub channels and applied incoming events to its local state model.

Why PubNub instead of self-hosted WebSockets

This was the most debated architectural decision on the project. Self-hosted WebSockets — a Node.js cluster with Socket.io, or a dedicated solution like Ably's self-hosted ancestors — would have given us more control and lower marginal cost at scale. The case against was simpler: operational overhead and global reach.

Biathlon fans are concentrated in Norway, Germany, Austria, and France. We did not have a global infrastructure footprint. Standing up WebSocket servers in multiple regions with proper failover, load balancing sticky sessions across instances, handling the reconnection thundering herd after a brief outage — this was real work. PubNub had global Points of Presence and handled that operational surface for us.

The trade-offs were real. PubNub is not cheap at scale. Vendor lock-in is genuine — the PubNub client API is specific to PubNub, and migrating away would require changes on both the publish and subscribe sides. We accepted both trade-offs explicitly, documented them, and moved on. If the platform had needed to run for 15 years and serve a billion events per month, the calculus would have been different. At our scale, paying for operations-as-a-service was the right call.

One PubNub feature that proved its worth repeatedly was the History API. A fan joining a race in progress doesn't want to wait for the next event to render the leaderboard — they want the current state immediately. PubNub's history let us replay the last N messages on a channel on connect, which meant new subscribers could reconstruct current state without us maintaining a separate "current state" endpoint.

Channel design: fan-out vs. fan-in

We used one channel per active race, plus a global events channel for non-race updates (schedule changes, athlete withdrawals). Each race channel carried all events for that race: position updates, shooting results, time gaps, penalty notifications, finish events.

The alternative was per-athlete channels — each fan subscribes only to the athletes they follow. This would have been cleaner in theory and would have reduced per-subscriber message volume. In practice it was unworkable, because biathlon fans watch the whole race, not just one athlete. The race narrative is about relative position — who overtook whom at the shooting range, who missed and fell back. You need the full field to understand the story. Per-athlete channels would have required the client to subscribe to 50 channels and maintain the merge, which is exactly the kind of complexity you don't want in a browser client during a peak-traffic race.

One channel per race meant every subscriber to that race channel received every event. At 400,000 concurrent subscribers, PubNub's infrastructure was doing the fan-out. That's what we were paying for.

The shooting range data problem

Biathlon has 5 targets per shooting stage and typically 4 shooting stages per race. With 50 athletes, that's up to 1,000 individual target hits or misses to track across a race. Each shooting event is binary — hit or miss — but the timing matters: did the miss happen early in the stage (giving the athlete time to compensate by slowing down) or late (forcing a panic shot)?

We represented each athlete's shooting record as a compact bit vector. Five bits per stage, four stages, 50 athletes. The full shooting state for a race fit comfortably in a JSON payload under 2KB. Incremental updates were even smaller: a single shooting event message identified the athlete, the stage, the target index, and the result.

// Shooting event message shape
{
  "type": "SHOT",
  "raceId": "wc-2021-oberhof-sprint-m",
  "athleteId": "bib-17",
  "stage": 2,          // 1-indexed, 1–4
  "target": 3,         // 1-indexed, 1–5
  "hit": false,
  "rangeTime": 28.4,   // seconds at range so far
  "ts": 1641234567890
}

The 300ms delivery window I mentioned in the lead was the observed end-to-end latency from a shot happening on the range to the visualization updating in the browser, once everything was tuned. This included IBU feed polling interval jitter, which meant in practice some shots appeared in the feed in a batch. Our normalization layer preserved the sequence numbers from the IBU feed so the frontend could render shots in the correct order even when they arrived batched.

Edge cases that bit us

Clock skew and retroactive updates

The IBU feed occasionally published retroactive corrections. An athlete's time for a lap would appear, then a corrected time would appear in a later poll with the same event timestamp but a different value. Our first implementation treated every incoming event as additive and append-only. The first time we saw a retroactive correction, the leaderboard showed two different times for the same athlete until the next full-state reconciliation ran.

The fix was to include an IBU internal revision counter in each event. The frontend applied events with a higher revision counter and discarded events with a lower or equal counter for the same (athlete, lap) pair. Simple upsert semantics, but we'd missed it in the initial design.

The 2021 World Championship mass start finish

A mass start race in biathlon is chaotic by design — all athletes start simultaneously and race position determines who wins, no time adjustments. The 2021 World Championship mass start had six athletes finishing within a two-second window. Our finish event processing was not designed for that throughput in that small a time window.

The specific problem: our normalization layer was processing finish events sequentially in the order they came from the IBU feed, and the IBU feed itself had them arriving in an unusual order (not strictly by finish time) due to transponder timing. We ended up with a brief period where the leaderboard showed incorrect placings — second place showed as fifth, third showed as first — before the reconciliation loop corrected it about 8 seconds later.

Eight seconds of wrong placings in a World Championship mass start finish is not a good look. The fix required rethinking the finish event model to separate "athlete crossed the line at this timestamp" from "athlete's final placing is N." The placing calculation was deferred until we had all finishes from that proximity cluster, rather than being emitted immediately on each finish event. We shipped that change before the next season.

Validating latency: the stopwatch method

We validated end-to-end latency with a method I'm slightly embarrassed to describe but which worked reliably: we sat in front of a TV broadcast with a stopwatch and timed the gap between an on-screen event (a shot hit, a finish line crossing, an athlete entering the range) and the same event appearing in the browser.

The TV broadcast itself has latency — typically 4–8 seconds for satellite, 1–3 seconds for streaming. So we calibrated against an event with a known absolute timestamp in the IBU feed (the race start countdown), correlated it with the TV clock, and used that offset to adjust our measurements. The resulting end-to-end latency — IBU feed appearance to browser render — was consistently 1.2–1.8 seconds. About half of that was the IBU feed polling window; the rest was normalization, PubNub publish, browser receive, and render.

The best latency monitoring is usually the simplest. Before building a dedicated observability pipeline, sit in front of the product and time it against something you trust. You'll find problems that synthetic monitoring misses.

What we'd do differently: the normalization coupling problem

The normalization layer is too tightly coupled to the IBU feed schema. Every time IBU changes their event format — which happens between seasons as they add new data fields or restructure timing categories — the normalization layer requires changes. That's expected. What's not ideal is that the normalization layer also encodes business logic about race state: what constitutes a "valid" finish, how penalties are counted, which athlete states are terminal vs. provisional.

That business logic should live in a domain model separate from the feed-parsing logic. The normalization layer should be a dumb translator: IBU schema in, canonical event format out. The business logic should be a separate component that takes canonical events and produces race state. We blurred that boundary in the original implementation to move faster, and paid for it every time the IBU changed something upstream.

The broader lesson is familiar: the more your system depends on the stability of an upstream API you don't control, the more carefully you need to insulate your domain logic from that API's schema. An anti-corruption layer is not optional when the upstream is a third party with its own release cadence.