Lesson 5.1 โ What Matillion Is (and Isn't)
If you walk into an interview thinking Matillion is "an ETL tool like the one we used in 2010," you'll be off. It's a GUI-driven orchestrator-plus-SQL-generator that pushes all the work into Snowflake. Different mental model, worth getting right up front.
The one-sentence version
Matillion is a visual ELT tool that compiles drag-and-drop transformation graphs into Snowflake SQL, and runs orchestration DAGs that sequence those transformations.
Read it again. Every word matters:
- Visual / drag-and-drop โ non-engineers can author jobs. That's the selling point at shops without a big DE team.
- ELT โ load raw first, transform in the warehouse. Matches the modern cloud warehouse pattern.
- Compiles to Snowflake SQL โ Matillion does not run compute. It generates one big SQL statement per transformation and sends it to Snowflake.
- Orchestration DAGs โ job A runs, then B and C in parallel, then D only if C succeeded, etc.
The two job types (this is on the test)
Orchestration Job
The outer DAG. Responsible for movement and control flow.
Components are things like: Run Command, Stage Files from S3, Run Transformation, If Then Else, Iterator, Email Notify, Python Script.
Transformation Job
The inner graph. Pure SQL generation.
Components: Table Input, Filter, Join, Aggregate, Calculator, Table Output. Saved as XML. Compiles to ONE Snowflake SQL statement executed against your chosen warehouse.
Why push-down matters
Most "ETL tools" (legacy Informatica, old Talend, older SSIS) did their own compute โ the tool had its own engine that pulled data from source, transformed it, pushed to target. That's I/O-heavy and scales poorly.
Matillion's push-down model moves the compute to the data:
Legacy ETL tool (pull-transform-push):
[source DB] โโโ [ETL engine RAM/CPU] โโโ [warehouse]
bottleneck, $$ server
Matillion (push-down):
[source DB] โโโ [Snowflake stage (S3)] โโโ [Snowflake] โ SQL compiled by Matillion
โฒ
โ
[Matillion VM/Hub
orchestrates, no compute]
Matillion ETL (legacy) vs Matillion Hub / Data Productivity Cloud (current)
| Matillion ETL VM | Matillion Hub (DPC) | |
|---|---|---|
| Form factor | Self-hosted AMI on EC2 | SaaS, zero infrastructure |
| Licensing | BYO-instance, annual | Consumption-based credits |
| Still relevant? | Yes at many enterprises, grandfathered | Vendor's direction |
| UI | Similar concepts, older chrome | Modernized, more composable |
Ask in the interview
"Which Matillion product is this role using?" โ legitimate, senior question. Shows you know there are two. If they say "Matillion ETL," you understand the VM story. If they say "Data Productivity Cloud" or "Matillion Hub," you're on the cloud-native one.
Who competes with Matillion?
The name "Matillion" is in the JD for a reason, but be aware of the ecosystem โ interviews often probe it:
- dbt โ SQL-in-Git, not GUI. Dominant at engineering-heavy shops. Not on this JD, but often mentioned in same breath.
- Fivetran โ pure ingest (E in ELT), not transformation. Complementary.
- Informatica IICS / Talend โ heavier, legacy-enterprise orientation.
- Airflow โ orchestrator, not transformation โ you'd still need dbt or similar for transforms.
Quick check
Q1. When Matillion runs a Transformation Job, where does the actual query execution happen?
Q2. You need to stage files from S3, then run a transformation, then email on success. Which job type(s)?