We replayed a month of Snowflake queries. Here's where the bill actually goes.
One month of real QUERY_HISTORY from a 40-person data team, run through chukei's replay simulator. No proxy installed, no account touched — just the receipts.
Everyone knows their Snowflake bill is too high. Almost nobody can tell you,
line by line, why. So we asked a customer for one thing: a month of
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY, fully anonymised. Then we ran every
query through chukei’s replay simulator and added up what it could
conservatively have saved — before anything touched the query path.
The result was not what the team expected. They assumed their costs were dominated by a handful of monster transformations. In reality, the bill was death by a thousand dashboards — and a large share of it was the same reads, recomputed. You can run the same replay against your own history; here’s what ours turned up.
01 / THE SHAPE OF THE BILLMost of the spend was repeat reads
The first cut is the simplest: how many queries were deterministic repeats of one already run earlier that day? On this account, the majority of query volume was repeat reads — BI tools and scheduled dashboards asking the same questions over and over, each one waking a warehouse to recompute an answer that had not changed.
Repeat reads are the cheapest thing in the world to eliminate, because the correct answer was already computed once. The only question is whether your stack is allowed to notice — and whether you can prove the cached answer still matches Snowflake.
The warehouse woke up millions of times to answer a question it had already answered that morning.
— from the replay summary
02 / WHERE CHUKEI INTERVENESSix levers, decided on the wire
chukei does not replace Snowflake — it sits in front of it as a transparent wire-protocol proxy and decides, per query, whether the warehouse even needs to run. Concretely, six levers acted on the replayed history:
- Deploy — point one workload at the proxy; SQL and credentials stay with the existing driver.
- Cache — serve deterministic repeated reads from a cache that is continuously double-checked against live Snowflake.
- Suspend — model idle windows with a Poisson process and recommend safe early auto-suspend.
- Attribute — stamp every query with the user, app, team, or dbt model that owns the cost.
- Validate — project savings from QUERY_HISTORY before routing any traffic.
- Operate — fail open: anything uncertain degrades to verbatim passthrough.
False-positive-intolerant by design. The cache never serves writes, non-deterministic queries, or chunked/large results. Parse errors, cache misses, and unsafe result shapes all degrade to a byte-identical passthrough to Snowflake. When in doubt, chukei misses — it never breaks a query.
03 / THE RECEIPTSWhat the replay projected, conservatively
Here is the projected monthly saving for this account, by lever, holding the workload constant. The numbers are deliberately conservative — they only count queries where chukei could prove determinism or had a verified, fresh cache entry. Everything else is left on Snowflake.
| Lever | Mechanism | Share of projected saving |
|---|---|---|
cache | Verified result reuse | largest contributor |
suspend | Poisson idle model | second |
rewrite | Equivalence-tested SQL rules | third |
attribute | Wire-level cost ownership | enables the rest |
Totalled and de-duplicated for overlap (a cached query is not also re-timed for suspend), the replay projected a saving within the 15–30% range we tell every team to expect — recoverable without a single change to a dashboard, a dbt model, or a connection string beyond pointing it at chukei. We quote a range, never a guarantee: the real number depends on your workload mix and how well your warehouses are already tuned.
04 / HOW TO READ YOUR OWNRun the replay yourself
The replay simulator is part of the Apache-2.0 release. It reads a CSV export of your query history, simulates each lever offline, and emits an Ed25519-signed JSON evidence file that finance and security can verify. Nothing leaves your machine, and nothing is installed in the query path.
# export a month of history from Snowflake, then:
chukei replay --query-history queries.csv --evidence report.json
✓ parsed 4,210,773 queries (31 days)
✓ cache deterministic repeats identified
✓ suspend idle windows modelled (Poisson)
→ projected savings within 15–30% target band
✓ wrote signed report.json · Ed25519
Key takeaways
- On a BI-heavy account, repeat reads are the majority of spend — and the easiest class to serve from a verified cache.
- Most of the projected saving needs no query changes: it comes from caching and idle-suspend at the proxy layer.
- Always validate with replay first, and start enforcement in suggest-only mode before anything touches the path.
- The evidence is signed and reproducible — finance can audit every avoided credit, and the savings figure is a target to validate, never a promise.
Want the methodology in full, including how we anonymised the history and de-duplicated overlapping lever savings? It is in the repository, alongside the replay simulator itself. If you run it against your own account, we would genuinely like to see the shape of your bill — open an issue with the redacted summary.
Works on the cost-modelling and replay engine at OSO. Previously spent too long staring at Snowflake bills that nobody could explain.