04 Sep 2024 Implementing Slowly Changing Dimensions Type 2 in a Data Warehouse with ODI
By default, a transactional system can alter information, but only keeps track of the most recent version of the data that has been modified. To preserve a complete history of all modifications and to analyse the specific value of a piece of data at any given time, we can implement Slowly Changing Dimension (SCD) methodologies in our data warehouse. This gives us a historical record of how the value of a field in a dimension is changing over time, either occasionally or constantly.
There are 6 types of SCD:
SCD Type 5: Add mini dimension.
In this article today we’ll focus on an SCD Type 2 strategy to store the history of changes, and see how can this be implemented in Oracle Data Integrator (ODI) using its specific built-in functionalities for DWH integration scenarios.
How SCDs Work
SCD tables are specifically for managing and maintaining the history of changes in data over time. Type 2 dimensions are commonly used to keep track of variations in the attributes of a dimension, so let’s see how they work and look at an approach to create an SCD Type 2 table.
How SCD Type 2 Tables Work
Implementing SCD Type 2 involves using techniques like versioning and effective date ranges. The objective of a SCD Type 2 is to track historical changes in dimensional data over time, while ensuring the integrity of the existing data is maintained.
Version Registration
In an SCD Type 2 table, each row represents a distinct version of a dimension member. A new row is added whenever there is a change in the dimension attributes, ensuring that the historical data is preserved.
Effective and Expiration Attributes
An SCD Type 2 table includes effective attributes and expiration information. The effective attributes specify when a dimension member becomes valid and is in use, while the expiration information indicates the date when a particular version ceases to be active.
Change Management
When a dimension attribute changes, a new row is inserted with the updated values and a corresponding timestamp. The previous row is then “closed” by setting its expiration date to the moment just before the change occurred.
Strategy for Creating an SCD Type 2 Table
Identifying Changes
This step involves determining which dimension attributes need to be tracked and defining the conditions under which changes should be recorded.
Table Design
Define the table structure, making sure to include fields for the natural key, effective attributes, expiration date, and any other attributes that need to be tracked.
Control Fields
Add control fields such as Start Date and End Date to manage the temporal validity of each version.
Active Flag
The active flag indicates whether a row version of the record is currently active. If the record is current, the flag will be set to 1 or Y; if not, it will be set to 0 or N. Whenever there’s a change in the record’s values, a new row is added, and the active flag, along with other relevant columns, must be updated to reflect the change history.
Below you can see an example of SCD functionality, where we’re considering an employee whose status has been changed to “Removed”:
Let’s assume that, currently, there is only one record for this employee in the SCD, reflecting their historical status as “Removed”.
However, during the next data load, the employee’s status changes to “Contracted”:
After the data load, the SCD will contain a new record, showing the employee’s updated status and the effective date of the change.
By default, the currently valid employee status (with Active Flag: Y) will have a validity date of 31/12/4712; this is the default setting in Oracle. When the employee’s status changes and a new record is created, the Date To field will reflect the date of the load:
To summarise, an SCD Type 2 design should be used when a Fact table includes periods or dates as part of its structure and requires accurate tracking of the historical versions of a dimension’s attributes. This approach ensures that each row in the Fact table points to the correct version of the dimension, based on the date of the fact and the dimension’s effective dates.
Implementing SCD Type 2 in ODI
The key steps for the implementation of an SCD in ODI are as follows:
- Access ODI Studio.
- Create a new Data Model to bring database table metadata into ODI.
- Configure the table as an SCD.
- Configure the SCD attributes.
- Create a new Mapping.
- Configure Logical and Physical properties in mapping.
- Test the Mapping and validate it.
- Deploy and maintain the Mapping.
Step-By-Step Guide on How To Create and Configure A Slowly Changing Dimension
1. Access ODI Studio
Open Oracle Data Integrator to begin design and development.
2. Data Model Creation
Create a Data Model that reflects the structure of the SCD table. As the table has already been created in the database, we will reverse-engineer it to load into ODI within the desired Data Model:
3. Configure the Data Model as SCD
Once the table has been loaded into the Data Model, open it to proceed with the configuration. Within the Data Model, locate your dimension and open it with a double-click; this will open a window like the one shown below. In this window, navigate to the Definition tab and set the OLAP Type to Slowly Changing Dimension:
4. Configure SCD attributes (in Data Model)
In the Attributes tab in the datastore, configure each column of the table with one of the SCD Behavior options:
– Surrogate Key (): Select this option if your dimension uses a surrogate key. A surrogate key is an artificially generated key created during the ETL process for integration purposes. We highly recommend using surrogate keys in all data warehouse scenarios due to their many benefits. For example, surrogate keys are essential when your target records lack a direct natural key that guarantees uniqueness. This can occur in situations such as a Person table (where two individuals might share the same name and birthdate) or in log records (where two events could occur simultaneously, with the same timestamp). Surrogate keys are also crucial when integrating data from multiple source systems or instances.
– Natural Key (mandatory): The natural key is a unique key in your table that uniquely identifies a record. Every table should have a natural key, necessary for accurately updating or deleting records. This option is used to designate the primary key for the table.
– Overwrite on Change (optional): Use this option for dimension attributes that should be overwritten when their data changes, without adding a new row of information.
– Add Row on Change (mandatory): Configure this option for attributes in your dimension where you want to maintain historical information. When the value of these fields changes, a new row will be added to preserve historical values. What’s more, the Start Date, End Date, and Current Record Flag fields will be updated in the previous row.
– Starting Timestamp (mandatory): This option marks the point in time when the attributes for a specific version of the dimension become effective. Typically, this field is set to the date when the record is loaded, usually using SYSDATE.
– Ending Timestamp (mandatory): This option indicates when the current version of the dimension becomes invalid and is succeeded by a new version. It specifies the time just before the dimension’s attributes change, with the next version beginning from this timestamp.
– Current Record Flag (mandatory): This option is set in the Flag field to indicate whether the record is active (containing the most recent information) or inactive (containing historical information).
Below we can see how we configured these attributes in our example SCD table:
5. Create the Mapping to load the data flow from the Source table into the SCD dimension
Now let’s create the Mapping that will include both the Source table and the Destination table where the data will be loaded:
This would be the logical configuration for each of the target fields:
As we can see, the data integration process in the Destination table is straightforward, as ODI handles the logic needed to process and incrementally load the incoming data.
Additional Information
- START_DATE: This date field indicates when a record was inserted into the table, marking the start of its validity. As long as this date is less than the END_DATE of the record, the record is considered active.
- END_DATE: This date field indicates when the record’s validity period ends. The status of the record depends on whether this date is in the past or the future. If the END_DATE is in the past, the record is no longer valid, as a more recent record exists, but if the END_DATE is in the future, the record is considered the most current.
By default, the date fields can be left without specific logic, as one of the properties of the SCD-type IKM (Integration Knowledge Module) is to automatically manage these values, so the manual configuration of these two fields isn’t necessary.
6. Configuring the Mapping with the SCD Table
Logical target Integration type
In the Logical section of the Mapping, select the SCD table, then in the Properties section, under Target, set the Integration Type to Slowly Changing Dimension:
Physical Target Integration Knowledge Module (IKM)
In the physical design of the Mapping, click on the SCD Table and, in Properties, go to the Integration Knowledge Module option and select the Slowly Changing Dimension IKM. You can use the default one provided by Oracle, or use a customised IKM specific to your project:
7. Testing and Validation
The testing and validation phase is essential to ensure the correct implementation of Slowly Changing Dimensions Type 2 in ODI. Here’s a step-by-step guide to ensure the integrity and accuracy of your historical data:
i. Initial Load Testing
Goal: Verify that the initial data load into SCD Type 2 is performed correctly.
Steps: Run the initial load of both historical and new data. Validate that the historical records are properly loaded and that the primary keys are handled properly.
ii. Simulation of Data Changes
Goal: Evaluate how the system handles changes in dimension attributes.
Steps: Perform simulations of changes to both historical and new attributes. Verify that the previous versions of the records remain intact and that a new version is created for each change.
iii. Performance Analysis
Goal: Evaluate system performance when working with SCD Type 2, especially with large data sets.
Steps: Run load tests with large data sets. Monitor and analyse execution times and resource use.
iv. Validation of Historical Queries
Goal: Confirm that historical queries return accurate results.
Steps: Create queries that retrieve data from previous versions of the dimensions and compare the results with known information to verify accuracy.
v. Incremental Change Management
Goal: Ensure that incremental changes are handled correctly.
Steps: Apply incremental changes to the data and execute the load. Verify that only relevant versions are updated and that no unnecessary duplicates are created.
vi. Validation of Reports and Dashboards
Goal: Ensure that reports and dashboards based on dimensional data are accurate and consistent.
Steps: Run reports using SCD Type 2 dimensional data. Compare the results with expectations and validate temporal consistency.
vii. Deployment and Maintenance
Deploy the SCD table to the production environment. Set up periodic maintenance tasks to purge obsolete data and to ensure continued efficiency.
With these steps, the Mapping to load our SCD table is now fully configured.
Benefits of SCD Type 2 with ODI
Implementing Slowly Changing Dimensions with Oracle Data Integrator offers many advantages for data integration and historical data management.
- Change History: Maintains a complete history of data changes over time, crucial for analysing data evolution.
- Temporal Analysis: Enables the exploration of data at specific points in time, essential for historical reporting and trend analysis.
- Data Consistency: Ensures consistent and reliable data over time, even as dimension values change.
- Change Tracking: Facilitates tracking of specific attribute changes, helping with audits and troubleshooting.
- Improved Performance: Enhances performance by efficiently structuring access to historical information.
- Flexibility in Administration: Adapts to various business needs, allowing the selection of the appropriate SCD Type for each dimension.
- Improved Decision Making: Provides a comprehensive history of changes, supporting more informed and accurate decision-making.
Conclusions
Using ODI for SCD implementation not only enhances efficiency and consistency in data management but also ensures reliable and error-free data loading processes. By standardising development practices and leveraging dedicated IKMs, which encapsulate the knowledge on how to perform data integration in an SCD table, ODI improves the reliability of results and reduces project completion time.
The absence of SCD dimensions can lead to lost change history, inconsistent reporting, and increased long-term maintenance costs. Implementing SCD tables in ODI is crucial to maintain historical data integrity, facilitate detailed historical analysis, and support better decision-making.
Ultimately, ODI’s automated change detection and management process optimises dimension management, contributing to a more effective and accurate data analysis in a data warehouse environment.
If you’re looking to implement or optimise Slowly Changing Dimensions within your data warehouse environment, or if you need expert guidance on using Oracle Data Integrator for efficient and reliable data management, we’re here to help. Our team of experienced professionals can help you ensure that your data is accurate, consistent, and ready to support your business decisions. Contact us today to learn how we can help you to achieve your data management goals!