The Cloudflare Blog · Brian Brunner · May 2026 · a field guide
The company that could not query itself — and how it built Town Lake & Skipper
A billion events a second flow through Cloudflare's network. For a decade, the company that sells the world its data infrastructure could not easily answer a simple question about itself. This is the architecture of how that changed — drawn out, mechanism by mechanism, with the decade of history that forced it.
01 — FEEL THE SCALE FIRSTOne second on Cloudflare
You cannot understand the data problem without first standing inside the firehose. Every HTTP request, every Worker invocation, every R2 read, every blocked attack throws off data — and it never stops, in 330+ cities across 120+ countries, all day, forever. The counter below is ticking at Cloudflare's real rate since the moment you opened this page.
The firehose
Hold onto that middle number. To keep dashboards loading, the pipeline throws away most of the data — it samples. That is the right call for a dashboard and exactly the wrong call when you are computing what to charge a customer. That single tension — fast-but-approximate versus slow-but-exact — runs underneath this entire story.
02 — THE ARCHIVEHow a labyrinth gets built, one good decision at a time
No one set out to build a mess. The sprawl is the sum of a dozen individually-correct engineering decisions, made years apart, each one right for its moment. Walk the timeline — click any era to open the file on it.
A decade of data infrastructure · click an era
Notice the pattern. After ClickHouse rescued the analytics pipeline, the whole company internalized a lesson — for every workload, go find the perfect specialized engine and master it. That lesson is true. It is also the trap: it scales straight into chaos. The Zero Trust team ran the same evaluation everyone ran and reached a different answer (TimescaleDB). Someone needed rollups and reached across the fence to Google's BigQuery. Each call was defensible. The sum was a maze.
03 — INSIDE THE MAZEThe analyst's archaeology dig
Put yourself inside it. You have a question. Before Town Lake, a question did not resolve into an answer — it resolved into a scavenger hunt. Pick one of the real questions Cloudflare engineers actually asked, and watch where it used to send you.
Ask a question · the "before"
And that was the good day — the day you knew where to look. The more corrosive truth: nobody could reliably find the data at all. Knowing that "Billable Workers requests by account" lived in a specific ClickHouse cluster, in a specific schema, joined to a specific Postgres dimension through an obscure customer-ID translation — that lived in one engineer's head. Data access had become a priesthood, and everyone else had to come and ask.
04 — THE FOUNDATIONTown Lake: unify at the query layer, not the storage layer
The name is Austin's. The idea is the philosophical opposite of every instinct that built the maze. They did not migrate everything into one giant warehouse — that is slow, costly, and a fresh lock-in. They built a lakehouse: a query engine that reads straight from object storage, with a metadata layer that makes the storage behave like a database. The heterogeneity stays; the unification happens at the query layer.
Here is the whole machine. Click any box to open it — you'll see its real input, what it does, and exactly what it hands back.
Town Lake · architecture explorer
Operational sources flow up into the lake; Trino queries across all of it; services around the edge handle metadata, access, PII, and transforms.
One query, three databases, no copying
This is the move that makes the whole thing work, so it's worth slowing all the way down. Apache Trino is a federated query engine: a single SQL statement can touch a Postgres table, a ClickHouse table, and an Iceberg table on R2 — and join them — without ever copying the intermediate results into some other system. Step through what happens to the question "top 100 paying customers by Workers requests this week."
Trino · query federation, step by step
The old five-login archaeology dig collapses into a single statement. The same revenue rollup that used to be a 200–300 line SQL incantation is now about five lines.
06 — STORAGE THAT GETS CHEAPER AS IT AGESR2 Data Catalog: keep everything, pay less for the past
Cold and warm data lives in R2 Data Catalog, Cloudflare's managed Apache Iceberg service. Iceberg is the quiet hero: an open table format that makes a pile of Parquet files in object storage behave like a real, versioned, time-travelable database table. Crucially, the data is compacted as it ages — and storage cost falls with recency, while every byte stays queryable.
Data aging · per-minute → hourly → daily
This is what finally dissolved the old cruel choice. Because full-fidelity Parquet on R2 is so much cheaper than the same rows sitting hot in an OLAP database, Cloudflare can now keep the unsampled truth affordably — so billing and security get exact data — while dashboards still sip from the fast, downsampled rollups. You no longer have to pick.
07 — SAFE TO ACTUALLY USEDefault-closed: governance by construction
The instant you unify all your data, you've built one enormous sensitive-data surface. The traditional answer is open by default, lock down by exception — allow everything, then restrict the scary tables when someone notices. Town Lake inverts it. A table is un-queryable until it has been reviewed. Walk a new table through its lifecycle:
A new table's lifecycle · default-closed
Two things keep that from being miserable. It's automated — the PII scanner does the heavy lifting and most reviews take seconds. And it's self-serve: query a table you can't touch and the error isn't a cold "permission denied," it's "this needs review — click here," and Skipper even names the right access group to request. There's a subtle but vital distinction underneath: schema discovery is separated from data access. You can see that a table exists, but unreviewed columns are hidden from DESCRIBE, SHOW COLUMNS, and SELECT * — so a brand-new column never silently breaks the dashboards built on the approved rest of the table.
And PII itself is opt-in per session. By default Trino redacts sensitive columns before they ever reach your screen. Flip the bit:
PII redaction · per-session, every flip logged
Skimmer: two passes, because the obvious PII isn't the hard part
Skimmer runs continuously, sampling rows from every column of every table and using Workers AI to decide what's sensitive. Emails and IP addresses are easy. The danger is the long tail — an opaque ID that can be traced back to a person, an API token that only betrays itself by its prefix. So it works in two passes, escalating only when it has to.
Skimmer · the two-pass classifier
Skipper: from a sentence to an auditable answer
A query engine alone isn't enough anymore. SQL is a barrier; so is knowing which of tens of thousands of tables to point it at. Skipper is the conversational agent that closes that last gap — natural-language question in, validated answer out, grounded in Cloudflare's real data, code, and institutional knowledge. The interface is a chat box. Watch the loop it actually runs.
Skipper · the closed-loop reasoning cycle
The hinge word is auditable. Skipper shows you the SQL it ran. You're never asked to trust a black box — you're handed the query and can check it. In a data system, trust isn't a feeling; it's the ability to verify.
10 — WHY IT DOESN'T LIEFive layers of context
Hand an LLM a prompt and a bare list of table names and it will hallucinate a join, misuse a column, and hand you a confidently wrong number. Cloudflare learned this the hard way in early experiments. The entire engineering effort behind Skipper is grounding — feeding the model layers of real context. Toggle them on and watch a wrong answer become a right one.
The grounding stack · toggle layers
Look hard at Layer 3. The tribal join logic that used to live only in the priesthood's heads — "alloc_amount is billed_amount/12 for annual plans, otherwise billed_amount" — is now emitted as documentation by the transform pipeline itself, on every successful run, and fed straight to the model. The institutional memory got externalized into code. As Cloudflare put it: code, not metadata, captures meaning.
Code Mode: don't call tools — write code that calls tools
The standard way to give an agent tools is to define them all in the prompt and let the model call them one at a time: call, parse, execute, return, repeat. It works, but it's chatty — a five-tool workflow is five round-trips, each one re-establishing context from scratch. Cloudflare's MCP server does something else.
Traditional tool-calling vs. Code Mode
Instead of 30 individual tools, the server exposes two — search and execute — and the model writes a single JavaScript snippet that drives the whole toolset, which runs in a sandboxed Dynamic Worker isolate. One round-trip, in a language the model already knows cold. Faster, cheaper, and the workflow is auditable as code. It's the kind of solution only a company whose entire platform is sandboxed JavaScript would reach for.
Everything runs as you
There is no privileged service account quietly reading everything on your behalf. Every single thing Skipper does runs as the calling user. No access to a table? Skipper can't fetch it for you. Ask for PII? Your permissions are checked. The elegant part is what happens when you share a dashboard.
A shared dashboard · access checked at view time
A saved query shared with a teammate is checked against the underlying tables at view time, not save time — because group membership changes, and yesterday's grant shouldn't leak tomorrow's data. Dashboards embed anywhere internal with a single <div> and a script tag; a Content-Security-Policy frame-ancestors rule blocks embedding outside the corporate domain, and Cloudflare Access still gates the iframe — so an unauthenticated viewer hits the login page inside the frame rather than ever seeing the data.
The payoff, in numbers
Billing was the original use case — and the proof. The customer-facing Billable Usage Dashboard now pulls the same compact (date, account_id, metric_name, usage) rows from Iceberg-on-R2 that the invoicing system uses. The number on the dashboard matches the number on the bill, by construction. Here's the rest.
Town Lake · measured impact
Bot Management queries ML scoring events above 0.9 confidence in the last 48 hours, sliced by ASN and geography. Threat researchers built their own toolkit on top. Trust & Safety pulls signals to police abuse. The questions that used to be projects are now just queries.
14 — WHAT THEY LEARNEDThe surprises
Elaborate, prescriptive system prompts ("first do X, then Y…") made quality worse. The model reasons about analytical workflows fine on its own. They swapped micromanagement for high-level guidance and results improved.
Three "fetch" tools, two "search" tools — the model kept calling the wrong one. They consolidated until every tool had a single reason to exist (fetch_results got a mode param instead of three twins).
The biggest accuracy wins came from ingesting the SQL that builds a table. A customer_type column looks identical everywhere — but the code reveals it defaults to paygo when Salesforce data is missing. That never lives in a column description.
Trino + Iceberg isn't new. The real work is the unglamorous stuff: per-row access control, default-closed allowlisting, query auditing, time-bound credentials, PII detection, idempotent ingestion, schema evolution. That's what makes a data platform safe to use.
The redistribution of a kind of power
Strip away the engine names and the lake puns and this isn't, at bottom, a technical story. It's a story about power — about who gets to turn a question into a true answer. For a decade that power was held by a small priesthood who knew the map and could be petitioned. Town Lake dissolved the map's scarcity; Skipper handed the result to anyone who can type a sentence. The support engineer, the analyst, the PM — they can now ask, in plain English, and get back an accurate, auditable answer in seconds.
What's next follows the same logic: deeper integration into chat and ticketing so "ask the data" becomes the reflex when debugging an incident; self-serve data engineering with the same shape as self-serve software; and a gradual migration of Town Lake's workload onto R2 SQL, Cloudflare's serverless query engine, as it matures. The bet underneath all of it — that the next breakthrough comes from someone who looks at the data and sees what no one else saw — is the one they're still making. Town Lake is how they make sure that person can find it.