Back

How We Handle Type 2 Slowly Changing Dimensions Without Losing Our Minds

Tracking how data changes over time can be a headache — especially when you’re trying to preserve history without creating a mess. At Emerald Codeworks, we use a practical, SQL-based approach to managing Type 2 Slowly Changing Dimensions (SCDs) that keeps things clean, accurate, and easy to work with. In this post, we’ll break down how we structure our tables, detect changes, and handle the trade-offs so your data warehouse stays reliable — and your team stays sane.

data, computer, business people, monitor, package, data packets, internet, online, www, zero, one, information, electronic, electronics, digitization, digital, surfing, amount of data, word, flood of data, database, bulk data, collect, evaluate, data volume, data retention, data storage, market research, records, data processing, complex, data collection, networking, computer science, network, database, database, database, database, database, data collection

At Emerald Codeworks, we spend a lot of time helping businesses make sense of their data — especially when it comes to tracking how that data changes over time. One common challenge in data warehousing is managing Slowly Changing Dimensions (SCDs) — particularly Type 2 SCDs, where we want to keep a history of changes rather than overwriting data.

This can get complex quickly, but over the years we’ve developed a practical approach that keeps things manageable (and keeps us sane). In this post, we’ll walk through how we handle Type 2 SCDs using SQL, the trade-offs we consider, and a few tips for keeping your data pipelines maintainable.

What Is a Type 2 Slowly Changing Dimension?

Let’s say you’re tracking customer information in your warehouse, and someone updates their email or moves to a new city. A Type 1 update would just overwrite the old values. But in Type 2, you preserve the history by inserting a new row with the updated values and marking the previous record as no longer current.

This lets you analyze the state of the world at any point in time — super useful for auditing, analytics, or compliance.

 A Typical Schema

We add a few extra fields to our dimension tables to support Type 2 changes:

CREATE TABLE dim_customer (
  customer_id      INT,
  customer_sk      INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name         VARCHAR,
  email         VARCHAR,
  city         VARCHAR,
  valid_from      DATE,
  valid_to       DATE,
  is_current      BOOLEAN
);
  • customer_sk: surrogate key (used for joins)
  • valid_from / valid_to: date range the record is valid for
  • is_current: convenience flag for querying the current record

 Detecting Changes (and What Counts as a Change)

We usually run a daily or hourly sync from the source system and compare it to what’s in our warehouse. The logic is:

  1.  If there’s no existing record, insert it.
  2. If the new version differs from the current record, expire the old one and insert a new one.

    Here’s a simplified version of how we do it in SQL (or dbt for our projects):

    WITH source_data AS (
      SELECT * FROM staging_customer
    ),

    latest_existing AS (
      SELECT *
      FROM dim_customer
      WHERE is_current = TRUE
    ),

    changed_records AS (
      SELECT src.*
      FROM source_data src
      LEFT JOIN latest_existing dim
       ON src.customer_id = dim.customer_id
      WHERE dim.customer_id IS NULL
        OR src.email <> dim.email
        OR src.city <> dim.city
    )

    -- expire existing records
    UPDATE dim_customer
    SET valid_to = CURRENT_DATE - 1,
      is_current = FALSE
    WHERE customer_id IN (SELECT customer_id FROM changed_records)
     AND is_current = TRUE;

    -- insert new records
    INSERT INTO dim_customer (
      customer_id, name, email, city, valid_from, valid_to, is_current
    )
    SELECT
      customer_id, name, email, city,
      CURRENT_DATE, '9999-12-31', TRUE
    FROM changed_records;

    You could also use hashes for change detection if your rows have lots of fields.

    Trade-Offs to Consider

    Decision Trade-Off
    Date ranges vs version numbers Dates are more readable and useful for time-based queries. Version numbers are simpler for debugging but harder to use in analysis.
    Hashing rows to detect changes? Improves performance, especially for wide tables, but introduces the risk of false positives due to precision or null handling.
    Use is_current flag? Makes querying current records faster, but adds complexity to your update logic.
    dbt vs SQL scripts? We lean toward dbt for transparency, reusability, and testing, but for quick jobs we’ll sometimes just write SQL.

    Our Tips for Staying Sane

    • Avoid over-engineering: Only track changes to fields that actually matter for analysis.
    • Always test your change detection logic — especially if you’re using hashing or null fields.
    • Store change metadata like updated_by or source_system if available.
    • Build a view of “current” records for analysts who don’t want to deal with the history.
    CREATE VIEW dim_customer_current AS
    SELECT * FROM dim_customer WHERE is_current = TRUE;

    Wrapping Up

    Handling Type 2 Slowly Changing Dimensions is one of those things that sounds easy until you’re doing it at scale, with real-world messiness. But with a few guardrails and solid SQL patterns, it becomes just another part of your data stack — not something to dread.

    At Emerald Codeworks, we help teams untangle these kinds of data problems every day. If you’ve got messy dimensions, multiple systems, or just need to see how your data has evolved over time — we can help.

    A stressed man looks at stock market data on his computer screen in an office setting.

    Data Quality or Data Chaos? Catch Issues Early With Automated Scanning

    Don’t let bad data creep into your dashboards. This post breaks down how we use tools like Soda and dbt to catch data issues before they cause downstr…

    Detailed image of a server rack with glowing lights in a modern data center.

    200 Zettabytes by 2025: What That Means for Your Business Data Strategy

    According to recent reports, the total amount of data created, captured, copied, and consumed globally is projected to reach 149 zettabytes in 2024, a…

    Your Data Has More to Say—Let’s Uncover It

    Partner with Emerald CodeWorks to build the modern data stack your business needs to grow with confidence.

    Lets Talk