← Back to blog

How we run sub-minute uptime checks from 6 regions

A look under the hood at SiteChecker's monitoring infrastructure — the trade-offs, the costs, and why we picked the stack we did.

John Smith

When we set out to build SiteChecker, we had one non-negotiable: every monitor should run from at least 6 geographically distinct regions, every 60 seconds, without bankrupting us.

This post walks through how we got there.

The architecture in one diagram

(Diagram lives next to this MDX file as ./architecture.png — see how images-in-subfolder posts work below.)

The high-level flow is:

  1. A scheduler (one per region) reads the list of due monitors from MongoDB.
  2. Each check runs in a small worker that posts the result back to the API.
  3. The API computes status transitions and fans out alerts via the notifications service.

Why MongoDB for time-series

MongoDB’s time-series collections turned out to be a near-perfect fit:

  • Native bucketing keeps storage tight (~80% smaller than a naive schema).
  • Secondary indexes on monitorId make per-monitor queries fast.
  • TTL indexes give us automatic data retention with zero ops.

We benchmarked it head-to-head against Timescale, and the operational simplicity won out.

Multi-region scheduling — the gotcha

The hard part of multi-region monitoring isn’t running the checks. It’s making sure every region runs the check at roughly the same wall-clock minute, so you don’t get a confusing flapping pattern when a site goes down in just one region.

We solved it with a deterministic schedule: each monitor’s “minute slot” is derived from hash(monitorId) % 60. Every region picks up that same minute, independently, with no coordination needed.

What’s next

We’re working on:

  • 30-second resolution for the Business tier.
  • Synthetic user flows — full Playwright scripts instead of just HTTP checks.
  • A public status page for SiteChecker itself.

Until next time 👋