ALL POSTS FinOps

Open-Source vs Hosted Snowflake Cost Tools

Hosted optimizers want access to your query metadata or your data plane. A self-hosted Apache-2.0 proxy keeps SQL, credentials, and the data plane inside your own VPC. Here's the honest tradeoff.

DK Dan KowalskiSystems engineering, OSO Jun 13, 2026 8 min read
team=growthteam=ml team=bizopsteam=core

When you compare Snowflake cost optimization tools for enterprise, the deciding question is rarely “which one saves more.” It’s “what do I have to give this vendor to get the saving?” Hosted optimizers need access to your query metadata — and sometimes your data plane. A self-hosted, Apache-2.0 proxy keeps SQL, credentials, and the data plane inside your own VPC. This is the honest tradeoff between the two models, including when hosted is the right call.

Both models can cut a Snowflake bill. They differ on where the work happens and what leaves your network to make it happen. For a regulated team, that difference decides the procurement, not the savings slide. Below is the access model behind each, the security and compliance fit, the total-cost picture, and a frank section on when a hosted service is genuinely the better choice. For the wider field, see the best Snowflake cost optimization tools and the cornerstone guide to Snowflake cost optimization.

01 / ACCESS MODELWhat does a hosted Snowflake cost tool actually require?

Every optimizer needs something to reason about your spend. The split is in how much, and where it goes.

Hosted optimizers fall into two access patterns. The lighter pattern is metadata-only: you grant the vendor read access to SNOWFLAKE.ACCOUNT_USAGE (or pipe QUERY_HISTORY to them), and they analyse query shapes, warehouse utilisation, and costs in their own cloud. The heavier pattern is in-path: the tool sits in the query path with credentials, so it can rewrite SQL, route warehouses, or serve cached results — which means session tokens and query text transit a system you don’t operate.

Access patternWhat the vendor receivesTypical capability
Metadata-only (hosted)QUERY_HISTORY, warehouse stats, costsAnalysis, sizing advice, suspend tuning
In-path (hosted)Above, plus live queries and credentialsCaching, routing, inline rewrites
Self-hosted proxy (in your VPC)Nothing leaves your networkAll of the above, run by you

Neither hosted pattern is reckless — reputable vendors are SOC 2 audited and encrypt in transit. But both create a data-flow line item your security review has to clear, and the in-path pattern means a third party briefly holds the keys to your warehouse.

02 / THE SELF-HOSTED MODELHow does a self-hosted, open-source Snowflake cost tool work?

The self-hosted model moves the optimizer inside your perimeter. chukei is an Apache-2.0, self-hosted cost optimization engine: a transparent wire-protocol proxy that sits in the Snowflake query path inside your own VPC. Drivers — JDBC, snowflake-connector-python, dbt, BI tools — change one hostname and nothing else. SQL and credentials stay with the existing driver.

Because the proxy runs on your infrastructure, the data-flow line item disappears: there is no vendor cloud to send QUERY_HISTORY to, and no third-party hop for live queries. Session tokens live in memory only and are never persisted or logged. There is no data egress to a SaaS backend.

Fail open, by design. Parse errors, cache misses, non-deterministic SQL, writes, and unsafe result shapes all degrade to a byte-identical passthrough to Snowflake. The proxy never breaks a query — the worst case is that it does nothing and Snowflake answers as usual.

The same six levers a hosted optimizer advertises run here, locally: verified result caching, warehouse auto-suspend (a Poisson idle model that captured ~94% of modelled savings in simulation), deterministic SQL rewriting with no LLM on the hot path, per-team cost attribution at the wire, Ed25519-signed savings evidence, and an offline replay simulator to project savings before anything touches the path.

You aren’t choosing between optimization and control. You’re choosing whether the optimization runs in someone else’s cloud or in yours.

— the self-hosted tradeoff

03 / SECURITY & COMPLIANCEWhich model fits a regulated or air-gapped enterprise?

For most enterprises the decision is made in the security review, not the FinOps meeting. Three properties tend to settle it.

Data residency. If query text or metadata cannot leave your VPC — common in finance, healthcare, and public sector — a hosted optimizer that ingests QUERY_HISTORY into its own cloud is a non-starter regardless of its SOC 2 report. A self-hosted proxy keeps that data on your network by construction.

Credential blast radius. An in-path hosted tool holds Snowflake session tokens, however briefly. A self-hosted proxy holds them too — but inside your trust boundary, in memory, never written to disk or logs. The blast radius stays within your existing controls.

Auditable evidence. Regulated finance teams need to prove a saving, not just see a dashboard. chukei emits Ed25519-signed evidence files with a conservative methodology, so finance and audit can verify every avoided credit independently of the tool that produced it.

Where the query path lives in each model. Hosted in-path optimization routes live queries and credentials through a vendor cloud; the self-hosted proxy keeps the entire path inside your VPC.

04 / TOTAL COST & LOCK-INWhat does each model cost beyond the savings?

The savings line is only half the math. The other half is what you pay to get it, and how hard it is to leave.

Hosted optimizers commonly bill a percentage of savings or a platform fee. That aligns incentives but means the optimization gets more expensive exactly as it works, and the pricing model itself is a form of lock-in: your evidence of savings lives in the vendor’s dashboard. A self-hosted, Apache-2.0 tool has no licence fee and no percentage cut — you pay only the compute to run the proxy and the engineering time to operate it.

Lock-in is also architectural. A hosted in-path tool you can’t remove without a migration; a transparent proxy you remove by pointing the hostname back at Snowflake — the drivers never knew the difference. And because chukei is Apache-2.0, the source is yours to read, fork, and run indefinitely; there is no vendor whose roadmap or pricing change can strand you.

# self-hosted: the entire data path stays in your network
chukei serve --upstream account.snowflakecomputing.com --bind 0.0.0.0:443

 listening                in-VPC, TLS terminated locally
 credentials              in memory only · never logged
 egress                   none no vendor backend
 rollback                 repoint driver hostname to Snowflake

05 / WHEN HOSTED IS RIGHTWhen is a hosted Snowflake cost tool the better call?

Self-hosting is not a free lunch, and pretending otherwise would be dishonest. Hosted optimizers earn their keep in several real situations.

If your team has no appetite for operational overhead — no one to run a proxy, patch it, and own its uptime — a fully managed service removes that burden entirely. Vendors like Keebo and Espresso AI bring mature ML-driven tuning refined across many accounts, and SELECT.dev offers strong analytics out of the box; matching that breadth in-house takes effort. If your data isn’t residency-constrained and your security review is comfortable with a SOC 2 vendor holding metadata, the managed path is simply faster to value.

The honest line. Hosted tools are a good fit when operational simplicity beats data-plane control. Self-hosted is the fit when control, residency, and no-egress are non-negotiable — regulated finance, healthcare, public sector, or air-gapped environments. Many teams will land on different answers for good reasons.

Where chukei differs from the hosted ML optimizers is determinism: there is no LLM on the hot path, ~2 ms p99 overhead, and a false-positive-intolerant cache validated over a soak of ~120k queries / ~60k cache hits with zero mismatches. That predictability is the trade you make for running it yourself — and for many enterprises, it’s the trade worth making.

Key takeaways

  • Access model is the real decision. Hosted tools need your metadata or your data plane; a self-hosted proxy keeps SQL, credentials, and data inside your VPC.
  • No data egress, credentials in memory only. The self-hosted, Apache-2.0 model fits residency-constrained, regulated, and air-gapped enterprises by construction.
  • Cost and lock-in cut both ways. Self-hosted removes vendor fees and percentage-of-savings billing but adds operational ownership.
  • Hosted is still right when operational simplicity outweighs data-plane control and your data isn’t residency-constrained.
  • Determinism is the differentiator — no LLM on the hot path, ~2 ms p99, a verified cache, and signed evidence you can audit yourself.

If keeping SQL, credentials, and the data plane in your own VPC is the constraint that matters, chukei is the Apache-2.0 way to do it. The source, the proxy, and the replay simulator are all on GitHub — read it, run it in your VPC, and project your own savings before anything touches the query path.

Frequently asked questions

Is there an open-source Snowflake cost tool?
Yes. chukei is an Apache-2.0, self-hosted Snowflake cost optimization engine — a transparent wire-protocol proxy you run in your own VPC. It does verified result caching, warehouse auto-suspend, SQL rewriting, and per-team cost attribution with no client changes beyond one hostname.
What's a good Keebo alternative?
If you want optimization without sending query metadata to a vendor, a self-hosted proxy like chukei is the closest alternative: it runs in your infrastructure, keeps SQL and credentials in memory, and produces signed savings evidence. Keebo remains strong if you prefer a fully managed, hands-off service.
Do Snowflake cost tools see my data?
Hosted optimizers typically need your QUERY_HISTORY metadata, and some sit in the query path with credentials. A self-hosted proxy keeps everything in your VPC: it sees queries to make caching decisions but never persists or logs credentials and emits no data to a third party.
Can I run a Snowflake cost optimizer in my own VPC?
Yes. chukei is designed to run as a self-hosted proxy inside your VPC. Drivers change one hostname; SQL, credentials, and the data plane never leave your network, and any failure degrades to byte-identical passthrough to Snowflake.
Is self-hosted always cheaper than a hosted optimizer?
Not always. Self-hosted removes vendor fees and percentage-of-savings billing but adds the cost of running and operating the proxy. For regulated or data-residency-sensitive teams the control usually justifies it; for small teams wanting zero operational overhead, a hosted service can be the better call.
DK
Dan Kowalski

Builds the Rust wire-protocol core of chukei. Spends his time making sure the proxy adds milliseconds, never breakage.

SnowflakeFinOpsOpen SourceSecurity