Exercise 5.1 โ Matillion Hub Trial + First Job
Stand up a Matillion Hub trial, connect it to your Snowflake trial, and build a transformation that reads meter_reads and writes hourly_consumption_by_zip. First hands-on feel of push-down ELT.
Prereq: Snowflake trial (Ex 2.1), meter_reads table exists. Time: 45 min. Cost: ~$0.02 Snowflake credits. Matillion trial is free.
Step 1 โ Sign up for Matillion Hub
- Go to hub.matillion.com. Sign up with work/personal email.
- Create an organization. Pick the closest region to
us-west-2. - You'll be offered "Data Productivity Cloud" โ pick it (this is the current product).
- Trial gives you 14 days of unrestricted access + ongoing free-tier credits after.
Step 2 โ Connect Matillion to Snowflake
- Matillion โ Connectivity โ Credentials โ Add Snowflake.
- Provide: account URL (
xy12345.us-west-2.aws.snowflakecomputing.com), user, password. - Test connection. Should succeed.
- Create a Project:
utility-de-learning. Pick the Snowflake credential, databaseLEARN_DB, schemaUTILITY, warehouseLEARN_WH.
Step 3 โ Build a transformation
Your goal: given the synthetic meter_reads from Exercise 2.1, produce an hourly_consumption_by_zip summary.
- New Transformation Job:
hourly_consumption_by_zip. - Drag Table Input โ Source:
LEARN_DB.UTILITY.METER_READS. Select all columns. - Drag Calculator after it. Add a column
read_hour = DATE_TRUNC('hour', read_ts). - Drag Aggregate. Group by
premise_zip, read_hour. Aggregate:SUM(consumption_kwh) AS hourly_kwh,COUNT(*) AS reads_in_hour. - Drag Rewrite Table. Target:
LEARN_DB.UTILITY.HOURLY_CONSUMPTION_BY_ZIP. - Wire them in order: Input โ Calculator โ Aggregate โ Rewrite Table.
- Click Validate. Matillion should accept the graph.
- Click Run Job. Watch it execute.
Step 4 โ See the SQL Matillion generated
This is the magic moment. In the job detail panel, find the "SQL" or "Generated SQL" view. Matillion compiled your 4-node graph into a single statement like:
CREATE OR REPLACE TABLE LEARN_DB.UTILITY.HOURLY_CONSUMPTION_BY_ZIP AS
SELECT
premise_zip,
DATE_TRUNC('hour', read_ts) AS read_hour,
SUM(consumption_kwh) AS hourly_kwh,
COUNT(*) AS reads_in_hour
FROM LEARN_DB.UTILITY.METER_READS
GROUP BY premise_zip, DATE_TRUNC('hour', read_ts);
The entire transformation โ 4 visual nodes โ becomes one SQL statement. Snowflake executes it. Matillion did zero compute.
Step 5 โ Wrap in an Orchestration Job
- New Orchestration Job:
nightly_consumption_refresh. - Drag Run Transformation โ select
hourly_consumption_by_zip. - Drag Send Email after it. Add your email in Success path.
- Validate, Run.
That's the full Matillion pattern: Orchestration wraps Transformation wraps SQL.
Step 6 โ Verify in Snowflake & cleanup
SELECT * FROM LEARN_DB.UTILITY.HOURLY_CONSUMPTION_BY_ZIP LIMIT 20;
SELECT COUNT(*) FROM LEARN_DB.UTILITY.HOURLY_CONSUMPTION_BY_ZIP;
Expect ~7,200 rows (300 zips ร 24 hours ร 30 days / bucket). Drop when done:
DROP TABLE IF EXISTS LEARN_DB.UTILITY.HOURLY_CONSUMPTION_BY_ZIP;
Interview talking point โ save to notepad
"In Matillion I authored the transformation as a visual graph, but the generated SQL is a single CREATE OR REPLACE TABLE statement pushed down to Snowflake. So Matillion orchestrates, Snowflake computes โ which means performance tuning happens at the warehouse-sizing level, not the Matillion level."