Lesson 6.1 — What's Different About CI/CD for DE

You've architected GitOps pipelines. You know Jenkins, GitHub Actions, Docker runner images, pipeline-as-code. This lesson is not teaching you CI/CD — it's the delta for DE, in 20 minutes.

The four differences that matter

1 · The database is state — you can't just redeploy

In app deployment, you push a new image, kill old pods, done. In DE, the database schema and data are state. You can't "redeploy" a warehouse from scratch — you'd wipe petabytes.

Consequence: migrations. Every schema change is a versioned, idempotent SQL file applied in order. Tools: schemachange (Flyway-for-Snowflake, the most common), Liquibase, Sqitch.

snowflake-migrations/
├─ V1.0.0__initial_schema.sql           # CREATE DATABASE, SCHEMA, ROLES
├─ V1.0.1__create_meter_reads.sql       # CREATE TABLE
├─ V1.0.2__add_consumption_kwh_idx.sql  # ALTER (though Snowflake doesn't have indexes)
├─ V1.1.0__add_quality_column.sql       # ALTER TABLE ADD COLUMN
└─ R__masking_policies.sql              # R_ = repeatable; re-runnable views/procs

The Jenkins pipeline runs schemachange -f ./migrations -a $SF_ACCOUNT -u $SF_USER on merge to main. It applies new files in order, tracks applied state in a Snowflake metadata table.

2 · "Environment" isn't a branch

In app world, feature/foo deploys to a feature-branch environment via preview deploys. In DE, the data is too large to clone per branch — except Snowflake lets you clone for free.

The modern pattern:

PROD: the real database. DDL-change-managed by schemachange.
DEV: a zero-copy clone of PROD, refreshed nightly (or on-demand per branch). See Zero-Copy Clone.
CI on a branch: CREATE DATABASE dev_feature_foo CLONE prod; → apply branch migrations → run data tests → tear down.

3 · Data tests are first-class CI checks

App CI runs unit tests. DE CI runs data tests: the pipeline that produces daily_consumption must produce non-null, non-negative values, with no duplicates on (meter_id, date). These assertions run on real (or cloned) data, not mocks.

Common tools:

dbt tests — YAML-declared assertions (not_null, unique, relationships, accepted_values).
Great Expectations — Python-based, more flexible, more weight.
Snowflake native — just a SQL assertion wrapped in a CI step.

4 · Secrets have a different shape

If you've used a secret-manager integration (External Secrets Operator, Vault, similar) — same principle here. DE pipelines need:

Warehouse user credentials (Snowflake)
AWS role ARNs for stages & IAM
API keys for source systems (CIS exports, weather feeds)
Webhook URLs for alerts

Same secret-rotation rigor as app deploys. AWS Secrets Manager is typical. Never put keys in CREATE STAGE SQL that gets committed.

A complete DE CI/CD pipeline looks like

 PR opened
    │
    ├─ 1. Lint SQL files (sqlfluff, snowsql parse)
    ├─ 2. Create dev clone: dev_pr_<N> CLONE prod
    ├─ 3. Run schemachange migrations against dev_pr_<N>
    ├─ 4. Run data tests (dbt test, great_expectations)
    ├─ 5. Report pass/fail on PR
    │
 PR merged
    │
    ├─ 6. Run schemachange against PROD (from main branch)
    ├─ 7. Deploy Matillion job changes via Matillion API
    ├─ 8. Drop dev_pr_<N>
    └─ 9. Notify #data-engineering channel

The sentence that lands the interview

"Most of DE CI/CD is the pipeline patterns I already know — version control, Jenkinsfile, secret management, artifact registry. What's unique is (a) schema migrations via schemachange because the database is state, (b) using zero-copy clones to get preview environments, and (c) data-quality tests that run on real data rather than mocks."

← Module 6 index Exercise 6.1 →