Lesson 2.1 โ€” Warehouses, Storage, and the Architecture That Makes Snowflake Snowflake

Snowflake's entire differentiator is one idea: compute and storage are separate, elastic, and independently billed. Internalize this lesson and the rest of the JD's Snowflake requirements (Time Travel, Zero-Copy Clone, stored procs) become details, not concepts.

The three-layer architecture

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚  CLOUD SERVICES (metadata, auth, optimizer, txn manager)      โ”‚  โ† query compilation, RBAC,
 โ”‚  Runs on Snowflake's infra. You never size it.                โ”‚    transactions, result cache
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ†•
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚  COMPUTE (virtual warehouses)  โ†โ”€โ”€  YOU size / start / stop   โ”‚  โ† MPP clusters that execute
 โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”                                       โ”‚    SQL. Many can run at once.
 โ”‚  โ”‚ WH1โ”‚  โ”‚ WH2โ”‚  โ”‚ WH3โ”‚   (each is an MPP cluster)            โ”‚    Independent.
 โ”‚  โ””โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”˜                                       โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ†•
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚  STORAGE (columnar, compressed, immutable micro-partitions)   โ”‚  โ† Your tables live here.
 โ”‚  Auto-managed. You don't size it. Pay per TB.                 โ”‚    Shared by all warehouses.
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Three layers, three different cost & scale models:

LayerWhat you doBilling
Cloud ServicesNothing โ€” it's a managed black box. Handles auth, query parsing, metadata, the result cache.Free up to 10% of daily compute; rarely a concern.
Virtual Warehouse (compute)Create warehouses sized XS โ†’ 6X-Large. Start/stop them. Route workloads to dedicated warehouses.Per-second while running. XS = 1 credit/hr. ~$2/credit (varies).
StorageLoad data. Let it grow. Snowflake manages partitioning, compression, and layout.~$23/TB/month. Cheap.

Why this architecture is a big deal

In traditional on-prem MPP appliances, compute and storage were fused: a single box held both. Scaling compute meant buying more boxes, each with its own disk copy. Workload isolation required hardware.

Snowflake decoupled them. A practical consequence:

Same data, many warehouses

You can have WH_LOAD (LARGE, runs ETL every hour), WH_ANALYSTS (MEDIUM, BI tools), WH_ADHOC (XSMALL, cheap for one-offs), and WH_ML (XLARGE, feature engineering) โ€” all querying the same tables from the storage layer, with zero contention. No ETL backups-BI-dashboards drama.

Virtual warehouse lifecycle

The 4 knobs you need to know

  1. Size โ€” XSMALL, SMALL, MEDIUM, LARGE, XLARGE, 2XLARGE, ... 6XLARGE. Each step up doubles compute (and credits/hr).
  2. Auto-suspend โ€” seconds of idle before Snowflake stops it. Default 600s. Set this to 60 in dev, or you will burn credits.
  3. Auto-resume โ€” queue a query against a suspended warehouse; it starts automatically (takes a few seconds).
  4. Multi-cluster โ€” MIN_CLUSTER_COUNT / MAX_CLUSTER_COUNT. For concurrency: when one cluster is busy, spin up another. Different from scale up; see Module 9.

Here's the warehouse you'll create in the next exercise:

CREATE WAREHOUSE IF NOT EXISTS LEARN_WH
  WITH
    WAREHOUSE_SIZE     = 'XSMALL'
    AUTO_SUSPEND       = 60      -- stop after 60s idle (SAVES CREDITS)
    AUTO_RESUME        = TRUE
    INITIALLY_SUSPENDED = TRUE    -- don't start until first query
    COMMENT            = 'learning / $400 trial';

Storage internals โ€” micro-partitions

Snowflake stores tables as immutable micro-partitions: compressed columnar files of 50โ€“500MB each. You never manage them directly. But two things are worth knowing:

Interview gotcha

"Does Snowflake have indexes?" โ€” No traditional indexes. Pruning via micro-partition metadata replaces most index use cases. Clustering keys are the closest equivalent, and they're usage-driven, not schema-driven.

The result cache โ€” a free speed boost

Snowflake's Result Cache stores query results at the account level for 24 hours. If the exact same query runs against unchanged underlying data, the second execution returns instantly with no warehouse compute billed.

In dashboard-heavy workloads this can wipe out 50โ€“80% of compute cost for free. Don't break it by appending CURRENT_TIMESTAMP() or RAND() to queries without reason.

Mapping to what you know

ConceptOn-prem MPP analogSnowflake
Compute sizingNumber of SPUs on the appliance (hardware)Warehouse size (software, elastic, per-second)
Workload isolationSeparate appliance or QoS rulesSeparate warehouse pointing at same data
Storage layoutZone maps (min/max per extent)Micro-partition min/max metadata โ€” conceptually identical
Scale outBuy more hardwareIncrease warehouse size (scale up) or multi-cluster (scale out for concurrency)
Workload hoursAppliance is 24/7Warehouse is off by default; starts on demand

Key terms from this lesson

Virtual Warehouse Auto-suspend Micro-partition Clustering Key Result Cache

Self-check

Q1. You have an ETL warehouse (LARGE) running hourly. An analyst's dashboard is slow at 9am every day. What's the lowest-effort fix?


Q2. AUTO_SUSPEND=600, warehouse runs a query at 9:00:00, query finishes at 9:00:10, no queries for the rest of the hour. Credits billed?


Q3. Same setup but AUTO_SUSPEND=60. Credits billed?