Lesson 6.1 โ€” What's Different About CI/CD for DE

You've architected GitOps pipelines. You know Jenkins, GitHub Actions, Docker runner images, pipeline-as-code. This lesson is not teaching you CI/CD โ€” it's the delta for DE, in 20 minutes.

The four differences that matter

1 ยท The database is state โ€” you can't just redeploy

In app deployment, you push a new image, kill old pods, done. In DE, the database schema and data are state. You can't "redeploy" a warehouse from scratch โ€” you'd wipe petabytes.

Consequence: migrations. Every schema change is a versioned, idempotent SQL file applied in order. Tools: schemachange (Flyway-for-Snowflake, the most common), Liquibase, Sqitch.

snowflake-migrations/
โ”œโ”€ V1.0.0__initial_schema.sql           # CREATE DATABASE, SCHEMA, ROLES
โ”œโ”€ V1.0.1__create_meter_reads.sql       # CREATE TABLE
โ”œโ”€ V1.0.2__add_consumption_kwh_idx.sql  # ALTER (though Snowflake doesn't have indexes)
โ”œโ”€ V1.1.0__add_quality_column.sql       # ALTER TABLE ADD COLUMN
โ””โ”€ R__masking_policies.sql              # R_ = repeatable; re-runnable views/procs

The Jenkins pipeline runs schemachange -f ./migrations -a $SF_ACCOUNT -u $SF_USER on merge to main. It applies new files in order, tracks applied state in a Snowflake metadata table.

2 ยท "Environment" isn't a branch

In app world, feature/foo deploys to a feature-branch environment via preview deploys. In DE, the data is too large to clone per branch โ€” except Snowflake lets you clone for free.

The modern pattern:

3 ยท Data tests are first-class CI checks

App CI runs unit tests. DE CI runs data tests: the pipeline that produces daily_consumption must produce non-null, non-negative values, with no duplicates on (meter_id, date). These assertions run on real (or cloned) data, not mocks.

Common tools:

4 ยท Secrets have a different shape

If you've used a secret-manager integration (External Secrets Operator, Vault, similar) โ€” same principle here. DE pipelines need:

Same secret-rotation rigor as app deploys. AWS Secrets Manager is typical. Never put keys in CREATE STAGE SQL that gets committed.

A complete DE CI/CD pipeline looks like

 PR opened
    โ”‚
    โ”œโ”€ 1. Lint SQL files (sqlfluff, snowsql parse)
    โ”œโ”€ 2. Create dev clone: dev_pr_<N> CLONE prod
    โ”œโ”€ 3. Run schemachange migrations against dev_pr_<N>
    โ”œโ”€ 4. Run data tests (dbt test, great_expectations)
    โ”œโ”€ 5. Report pass/fail on PR
    โ”‚
 PR merged
    โ”‚
    โ”œโ”€ 6. Run schemachange against PROD (from main branch)
    โ”œโ”€ 7. Deploy Matillion job changes via Matillion API
    โ”œโ”€ 8. Drop dev_pr_<N>
    โ””โ”€ 9. Notify #data-engineering channel

The sentence that lands the interview

"Most of DE CI/CD is the pipeline patterns I already know โ€” version control, Jenkinsfile, secret management, artifact registry. What's unique is (a) schema migrations via schemachange because the database is state, (b) using zero-copy clones to get preview environments, and (c) data-quality tests that run on real data rather than mocks."