Lesson 2.1 — Warehouses, Storage, and the Architecture That Makes Snowflake Snowflake

Snowflake's entire differentiator is one idea: compute and storage are separate, elastic, and independently billed. Internalize this lesson and the rest of the JD's Snowflake requirements (Time Travel, Zero-Copy Clone, stored procs) become details, not concepts.

The three-layer architecture

 ┌───────────────────────────────────────────────────────────────┐
 │  CLOUD SERVICES (metadata, auth, optimizer, txn manager)      │  ← query compilation, RBAC,
 │  Runs on Snowflake's infra. You never size it.                │    transactions, result cache
 └───────────────────────────────────────────────────────────────┘
                               ↕
 ┌───────────────────────────────────────────────────────────────┐
 │  COMPUTE (virtual warehouses)  ←──  YOU size / start / stop   │  ← MPP clusters that execute
 │  ┌────┐  ┌────┐  ┌────┐                                       │    SQL. Many can run at once.
 │  │ WH1│  │ WH2│  │ WH3│   (each is an MPP cluster)            │    Independent.
 │  └────┘  └────┘  └────┘                                       │
 └───────────────────────────────────────────────────────────────┘
                               ↕
 ┌───────────────────────────────────────────────────────────────┐
 │  STORAGE (columnar, compressed, immutable micro-partitions)   │  ← Your tables live here.
 │  Auto-managed. You don't size it. Pay per TB.                 │    Shared by all warehouses.
 └───────────────────────────────────────────────────────────────┘

Three layers, three different cost & scale models:

Layer	What you do	Billing
Cloud Services	Nothing — it's a managed black box. Handles auth, query parsing, metadata, the result cache.	Free up to 10% of daily compute; rarely a concern.
Virtual Warehouse (compute)	Create warehouses sized XS → 6X-Large. Start/stop them. Route workloads to dedicated warehouses.	Per-second while running. XS = 1 credit/hr. ~$2/credit (varies).
Storage	Load data. Let it grow. Snowflake manages partitioning, compression, and layout.	~$23/TB/month. Cheap.

Why this architecture is a big deal

In traditional on-prem MPP appliances, compute and storage were fused: a single box held both. Scaling compute meant buying more boxes, each with its own disk copy. Workload isolation required hardware.

Snowflake decoupled them. A practical consequence:

Same data, many warehouses

You can have WH_LOAD (LARGE, runs ETL every hour), WH_ANALYSTS (MEDIUM, BI tools), WH_ADHOC (XSMALL, cheap for one-offs), and WH_ML (XLARGE, feature engineering) — all querying the same tables from the storage layer, with zero contention. No ETL backups-BI-dashboards drama.

Virtual warehouse lifecycle

The 4 knobs you need to know

Size — XSMALL, SMALL, MEDIUM, LARGE, XLARGE, 2XLARGE, ... 6XLARGE. Each step up doubles compute (and credits/hr).
Auto-suspend — seconds of idle before Snowflake stops it. Default 600s. Set this to 60 in dev, or you will burn credits.
Auto-resume — queue a query against a suspended warehouse; it starts automatically (takes a few seconds).
Multi-cluster — MIN_CLUSTER_COUNT / MAX_CLUSTER_COUNT. For concurrency: when one cluster is busy, spin up another. Different from scale up; see Module 9.

Here's the warehouse you'll create in the next exercise:

CREATE WAREHOUSE IF NOT EXISTS LEARN_WH
  WITH
    WAREHOUSE_SIZE     = 'XSMALL'
    AUTO_SUSPEND       = 60      -- stop after 60s idle (SAVES CREDITS)
    AUTO_RESUME        = TRUE
    INITIALLY_SUSPENDED = TRUE    -- don't start until first query
    COMMENT            = 'learning / $400 trial';

Storage internals — micro-partitions

Snowflake stores tables as immutable micro-partitions: compressed columnar files of 50–500MB each. You never manage them directly. But two things are worth knowing:

Pruning. Snowflake tracks min/max and distinct-value counts per column per partition. When you WHERE read_date = '2026-04-10', the optimizer only opens partitions whose min/max range includes that date. On a 2-billion-row table, typical pruning takes you from hours to seconds.
Clustering key. Optional. If you set one (e.g., CLUSTER BY (read_date, meter_id)), Snowflake background-reshuffles the partitions so pruning works better. Almost always unnecessary on tables <1TB. See glossary.

Interview gotcha

"Does Snowflake have indexes?" — No traditional indexes. Pruning via micro-partition metadata replaces most index use cases. Clustering keys are the closest equivalent, and they're usage-driven, not schema-driven.

The result cache — a free speed boost

Snowflake's Result Cache stores query results at the account level for 24 hours. If the exact same query runs against unchanged underlying data, the second execution returns instantly with no warehouse compute billed.

In dashboard-heavy workloads this can wipe out 50–80% of compute cost for free. Don't break it by appending CURRENT_TIMESTAMP() or RAND() to queries without reason.

Mapping to what you know

Concept	On-prem MPP analog	Snowflake
Compute sizing	Number of SPUs on the appliance (hardware)	Warehouse size (software, elastic, per-second)
Workload isolation	Separate appliance or QoS rules	Separate warehouse pointing at same data
Storage layout	Zone maps (min/max per extent)	Micro-partition min/max metadata — conceptually identical
Scale out	Buy more hardware	Increase warehouse size (scale up) or multi-cluster (scale out for concurrency)
Workload hours	Appliance is 24/7	Warehouse is off by default; starts on demand

Key terms from this lesson

Virtual Warehouse Auto-suspend Micro-partition Clustering Key Result Cache

Self-check

Q1. You have an ETL warehouse (LARGE) running hourly. An analyst's dashboard is slow at 9am every day. What's the lowest-effort fix?

Q2. AUTO_SUSPEND=600, warehouse runs a query at 9:00:00, query finishes at 9:00:10, no queries for the rest of the hour. Credits billed?

Q3. Same setup but AUTO_SUSPEND=60. Credits billed?

← Module 2 index Lesson 2.2 →