16 Jan 2013 Analytic Functions: Using LAG for SCD type 2
In one of my previous blog articles called ´´Deduplication using Analytic Function´´, I explained a simple introduction of Oracle Analytic Functions. In this article (addressed for SQL developers), I will focus on the LAG function which from my point of view, is one of the most useful functions. Let’s see the use of this function in a common Data Warehouse scenario: Slowly Changing Dimension.
LAG analytic function
The LAG analytic function gives access to multiple rows within a table, without the need of a self-join. Basically, it returns the value of an attribute of the previous row.
LAG (value_expression [,offset] [,default]) OVER (order_by_clause)
- – value_expression: Can be a column or a built-in function.
- – offset: The number of rows preceding the current row. Optionally, the default value is 1.
- – default: The value returned if the offset is out of the partition range. Optionally, the default value is NULL.
- – order_by_clause : To indicate the order.
The LEAD function (retrieves the value of the following rows) is similar to the LAG, in case we change the DESC/ASC attribute after the ORDER BY clause.
Practical case: Implementing a SCD type 2
We consider having a table with the currency Euros-Dollars: a column with the rate and another column with the date since this rate is effective. To simplify the following sample, we will work with a typical “Slowly Changing Dimension type 2” table, that means to have 2 columns for the date range: “Valid Date From” and “Valid Date To”.
RATE | CURRENCY DATE | |
1.35 | 01/12/2012 | |
1.44 | 03/12/2012 | |
1.42 | 05/12/2012 |
If we want to have the following result,
RATE | VALID DATE FROM | VALID DATE TO |
1.35 | 01/12/2012 | 03/12/2012 |
1.44 | 03/12/2012 | 05/12/2012 |
1.42 | 05/12/2012 | NULL |
we can execute one of the two following queries.
- Using native SQL
- Using Analytic Functions
We can also have the same result using LEAD function but instead of having the order in DESC we should set it as ASC.
It is true that whatever an analytic function does, it can be done by native SQL, with self-joins and sub-queries. But the same routine done by analytic function is always faster, or at least as fast as, when compared to native SQL.
Conclusion
We have seen that by using a simple LAG function we can avoid writing a complex SQL query, as we can see in our Slowly Changing Dimension example. Sometimes it seems that the analytic functions are more difficult than the native SQL, but as soon as we are familiar with the syntax, we will be able to have a clear and easier to read code.