Back

How We Handle Type 2 Slowly Changing Dimensions Without Losing Our Minds

Tracking how data changes over time can be a headache — especially when you’re trying to preserve history without creating a mess. At Emerald Codeworks, we use a practical, SQL-based approach to managing Type 2 Slowly Changing Dimensions (SCDs) that keeps things clean, accurate, and easy to work with. In this post, we’ll break down how we structure our tables, detect changes, and handle the trade-offs so your data warehouse stays reliable — and your team stays sane.

data, computer, business people, monitor, package, data packets, internet, online, www, zero, one, information, electronic, electronics, digitization, digital, surfing, amount of data, word, flood of data, database, bulk data, collect, evaluate, data volume, data retention, data storage, market research, records, data processing, complex, data collection, networking, computer science, network, database, database, database, database, database, data collection

At Emerald Codeworks, we spend a lot of time helping businesses make sense of their data — especially when it comes to tracking how that data changes over time. One common challenge in data warehousing is managing Slowly Changing Dimensions (SCDs) — particularly Type 2 SCDs, where we want to keep a history of changes rather than overwriting data.

This can get complex quickly, but over the years we’ve developed a practical approach that keeps things manageable (and keeps us sane). In this post, we’ll walk through how we handle Type 2 SCDs using SQL, the trade-offs we consider, and a few tips for keeping your data pipelines maintainable.

What Is a Type 2 Slowly Changing Dimension?

Let’s say you’re tracking customer information in your warehouse, and someone updates their email or moves to a new city. A Type 1 update would just overwrite the old values. But in Type 2, you preserve the history by inserting a new row with the updated values and marking the previous record as no longer current.

This lets you analyze the state of the world at any point in time — super useful for auditing, analytics, or compliance.

A Typical Schema

We add a few extra fields to our dimension tables to support Type 2 changes:

CREATE TABLE dim_customer (
  customer_id      INT,
  customer_sk      INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name         VARCHAR,
  email         VARCHAR,
  city         VARCHAR,
  valid_from      DATE,
  valid_to       DATE,
  is_current      BOOLEAN
);

customer_sk: surrogate key (used for joins)
valid_from / valid_to: date range the record is valid for
is_current: convenience flag for querying the current record

Detecting Changes (and What Counts as a Change)

We usually run a daily or hourly sync from the source system and compare it to what’s in our warehouse. The logic is:

If there’s no existing record, insert it.
If the new version differs from the current record, expire the old one and insert a new one.

Here’s a simplified version of how we do it in SQL (or dbt for our projects):

WITH source_data AS (
  SELECT * FROM staging_customer
),

latest_existing AS (
  SELECT *
  FROM dim_customer
  WHERE is_current = TRUE
),

changed_records AS (
  SELECT src.*
  FROM source_data src
  LEFT JOIN latest_existing dim
   ON src.customer_id = dim.customer_id
  WHERE dim.customer_id IS NULL
    OR src.email <> dim.email
    OR src.city <> dim.city
)

-- expire existing records
UPDATE dim_customer
SET valid_to = CURRENT_DATE - 1,
  is_current = FALSE
WHERE customer_id IN (SELECT customer_id FROM changed_records)
 AND is_current = TRUE;

-- insert new records
INSERT INTO dim_customer (
  customer_id, name, email, city, valid_from, valid_to, is_current
)
SELECT
  customer_id, name, email, city,
  CURRENT_DATE, '9999-12-31', TRUE
FROM changed_records;

You could also use hashes for change detection if your rows have lots of fields.

Trade-Offs to Consider

Decision	Trade-Off
Date ranges vs version numbers	Dates are more readable and useful for time-based queries. Version numbers are simpler for debugging but harder to use in analysis.
Hashing rows to detect changes?	Improves performance, especially for wide tables, but introduces the risk of false positives due to precision or null handling.
Use is_current flag?	Makes querying current records faster, but adds complexity to your update logic.
dbt vs SQL scripts?	We lean toward dbt for transparency, reusability, and testing, but for quick jobs we’ll sometimes just write SQL.

Our Tips for Staying Sane

Avoid over-engineering: Only track changes to fields that actually matter for analysis.
Always test your change detection logic — especially if you’re using hashing or null fields.
Store change metadata like updated_by or source_system if available.
Build a view of “current” records for analysts who don’t want to deal with the history.

CREATE VIEW dim_customer_current AS
SELECT * FROM dim_customer WHERE is_current = TRUE;

Wrapping Up

Handling Type 2 Slowly Changing Dimensions is one of those things that sounds easy until you’re doing it at scale, with real-world messiness. But with a few guardrails and solid SQL patterns, it becomes just another part of your data stack — not something to dread.

At Emerald Codeworks, we help teams untangle these kinds of data problems every day. If you’ve got messy dimensions, multiple systems, or just need to see how your data has evolved over time — we can help.

A stressed man looks at stock market data on his computer screen in an office setting.

Data Quality or Data Chaos? Catch Issues Early With Automated Scanning

Don’t let bad data creep into your dashboards. This post breaks down how we use tools like Soda and dbt to catch data issues before they cause downstr…

Apr 4, 2025

Detailed image of a server rack with glowing lights in a modern data center.

200 Zettabytes by 2025: What That Means for Your Business Data Strategy

According to recent reports, the total amount of data created, captured, copied, and consumed globally is projected to reach 149 zettabytes in 2024, a…

Apr 2, 2025

How We Handle Type 2 Slowly Changing Dimensions Without Losing Our Minds

Data Quality or Data Chaos? Catch Issues Early With Automated Scanning

200 Zettabytes by 2025: What That Means for Your Business Data Strategy

Your Data Has More to Say—Let’s Uncover It