08 Jun 2022 Oracle Data Integrator 12c: Subquery Filter Component
When creating a mapping in Oracle Data Integrator 12c, developers can use the components available in the Components Palette depending on their specific needs:
- Expression
- Aggregate
- Distinct
- Set
- Filter
- Join
- Lookup
- Sort
- Split
- Pivot
- Unpivot
- Table Function
- Subquery Filter
Components are simply dragged and dropped onto the design canvas and their properties are set. Some of the components in the palette are commonly used, such as Filter, Join, Aggregate, etc., and everybody is clear on how to use them.
However, in this article today we’re going to run through a component that falls into another category – rarely used and obscure! As you’ll have already guessed from the title, we’re talking about the subquery filter component.
The Subquery Filter Component
This component can be used when you need to filter data based on the result of a subquery. Conditions that can be used to filter rows are EXISTS/NOT EXISTS, IN/NOT IN, or mathematical expressions (=, !=, ^=, <, >, <>, <=, >=).
This component has two input and one output connector points. The first input connector point is a datastore that contains the data that needs to be filtered – DRIVER_INPUT; the second is a datastore that represents the subquery and is used to filter the data – SUBQUERY_FILTER_INPUT.
Below you can see the steps to follow when using the subquery filter component:
- Drag the subquery filter component from the palette to the design canvas:
Figure 1: Components Palette
- Connect the datastore that contains the data to be filtered to the first input connector – DRIVER_INPUT.
- Connect the datastore that represents the subquery to the second input connector – SUBQUERY_FILTER_INPUT.
- Drag all the attributes needed later in the mapping from the DRIVER_INPUT datastore to the subquery filter component.
- Drag the attributes from the SUBQUERY_FILTER_INPUT datastore to the subquery filter component if one of the following subquery filter roles is used: IN, NOT IN, =, !=, ^=, <=, >=, <>, >, <.
- Click on the subquery filter component and go to the Properties window.
6.1. In the Attributes tab, adjust the expressions for the attributes (from the previous two steps) if necessary.
6.2. Click on the Conditions tab.
- Fill the subquery filter condition with the condition that is going to be used for subquery filtering. This field must be filled if EXISTS/NOT EXISTS is selected for the subquery filter input role, as it relates the query from the driver datastore to the EXISTS subquery from the subquery datastore!
- For the subquery filter input role, select a value from the drop-down menu:
Figure 2: Drop-down menu – Subquery Filter Input Role
- For the group comparison condition, select a value from the drop-down menu:
Figure 3: Drop-down menu – Group Comparison Condition
The following sections show how to use the subquery component in more detail in some example use cases.
Use Case 1
Our goal is to load the sales data for customers whose age is above the average age in the customer table.
Below is the mapping created for this use case:
Figure 4: Use case 1 – Mapping
The mapping has five components:
- TRG_SALES – datastore with sales data.
- TRG_CUSTOMER – datastore with customer data.
- SUBQUERYFILTER_IN – subquery filter component used to filter customers.
- AGGREGATE_SUM – aggregate component used to total up the amount sold at product level.
- TRG_PRODUCT_CUST_ABOVE_AVG_AGE – target datastore used to store the final dataset.
To be able to use the subquery filter component to filter sales data for customers above the average age in the customer table, we must first connect the two source datastores to the subquery filter input connector points as follows:
- Driver input connector point – TRG_SALES.
- Subquery filter input connector point – TRG_CUSTOMER.
After connecting the datastores to the subquery filter component, drag the CUST_ID attribute from the two source datastores (indicated by the grey arrow before the attribute) to the CUST_ID attribute of the subquery filter component. The other subquery filter attributes, needed later in the mapping, are from the TRG_SALES datastore (indicated by the yellow arrow before the attribute).
Figure 5: Use case 1 – Attributes tab
Finally, the Condition properties of the subquery filter component should be set as shown in the image below:
Figure 6: Use case 1 – Condition tab
- Subquery filter condition: contains the condition used for filtering customer age (TRG_CUSTOMER.AGE) based on a comparison (>) with the average customer age (AVG(AGE)) that is calculated in the subquery.
- Subquery filter input role: select the value In from the drop-down menu.
- Group comparison condition: use the default value No condition.
When the rest of the mapping has been finished, it can be executed and below you can see the code that is generated:
Figure 7: Use case 1 – Generated code
The highlighted part of the code above is that generated by the subquery filter component, with the following characteristics:
- It is a part of the WHERE clause of the main SELECT statement.
- The CUST_ID attribute from the driving table is checked if it satisfies the IN condition of the subquery statement.
- The subquery filter condition is a part of the WHERE clause of the subquery statement, and in that part customer age is compared to the average age in order to filter customers above the average age (TRG_CUSTOMER.AGE > (SELECT AVG(AGE) FROM ODI_DEMO.TRG_CUSTOMERS)).
Use Case 2
Our goal is to load data into a target customer table only when orders have “completed” status.
Below is the mapping created for this use case:
Figure 8: Use case 2 – Mapping
The mapping has six components:
- SRC_CUSTOMER – datastore with customer data.
- SRC_ORDERS – datastore with order data.
- SRC_AGE_GROUP – datastore with age groups.
- SUBQUERYFILTER_EXISTS – subquery filter component used to check if there are completed orders.
- JOIN_CUST_AGE_GROUP – join component used to get the age group of the customer.
- TRG_CUSTOMER – target datastore used to store the final dataset.
In this use case the subquery filter component has the SRC_CUSTOMER and SRC_ORDERS datastores as its input connector points, connected as follows:
- Driver input connector point – SRC_CUSTOMER.
- Subquery filter input connector point – SRC_ORDERS.
Since the goal here is only to check if there are completed orders, all the subquery filter attributes come from the SRC_CUSTOMER datastore (indicated by the yellow arrow before the attribute). The expression for SUBQUERY_FILTER_INPUT1(SOURCE_ORDERS) is left empty for every attribute of the subquery filter component, given that EXISTS is used as the subquery filter input role.
If another subquery filter input role is used (other than EXISTS / NOT EXISTS), you’ll have to set an expression for the subquery filter input connector point (IN, NOT_IN or a mathematical expression)!
Figure 9: Use case 2 – Attributes tab
Our other two use cases, presented in this article, show examples where the expression for the subquery filter input connector point was set.
After dragging over the required attributes, the subquery filter component properties should be set as shown in the picture below:
Figure 10: Use case 2 – Condition tab
- Subquery filter condition: contains the condition used to filter the subquery; in this case only orders with completed status (SRC_ORDERS.STATUS=’COM’) are filtered.
- Subquery filter input role: select the Exists value from the drop-down menu.
- Group comparison condition: use the default value No condition.
When the rest of the mapping has been finished, it can be executed. Below we can see the code that is generated:
Figure 11: Use case 2 – Generated code
The highlighted part of the code above is that generated by the subquery filter component, with the following characteristics:
- It is part of the WHERE clause of the main SELECT statement.
- The subquery statement is checked to see whether it satisfies the EXISTS condition, and based on the result, the main dataset (from SRC_CUSTOMER) is filtered.
- The subquery filter condition is part of the WHERE clause of the subquery statement and it filters completed orders (SRC_ORDERS.STATUS=’COM’).
Use Case 3
Our goal is to load data into a target customer table for those customers of average age.
Below you can see the mapping we created for this use case:
Figure 12: Use case 3 – Mapping
The mapping has six components:
- SRC_CUSTOMER – datastore with customer data.
- SRC_CUSTOMER_AVG_AGE – datastore with customer data used to calculate average customer age.
- SRC_AGE_GROUP – datastore with age groups.
- SUBQUERYFILTER_MATH_EXP – subquery filter component used to check if the age of customer is the average age in the SRC_CUSTOMER_AVG_AGE.
- JOIN_CUST_AGE_GROUP – join component used to get the age group of the customer.
- TRG_CUSTOMER – target datastore used to store the final dataset.
The subquery filter component uses the SRC_CUSTOMER and SRC_CUSTOMER_AVG_AGE datastores as input connector points, connected as follows:
- Driver input connector point – SRC_CUSTOMER.
- Subquery filter input connector point –
The attribute AGE from both sources (indicated by the grey arrow before the attribute) is mapped to the attribute AGE of the subquery filter component using the expressions shown in the picture below. The rest of the subquery filter attributes come from the SRC_CUSTOMER datastore.
Figure 13: Use case 3 – Attributes tab
After dragging over the required attributes, the subquery filter component properties should be set as shown in the picture below:
Figure 14: Use case 3 – Condition tab
- Subquery filter condition: should be left empty as there is no need to filter the subquery dataset.
- Subquery filter input role: select the = value from the drop-down menu.
- Group comparison condition: use the default value No condition.
When the mapping is executed, the code shown in the picture below will be generated:
Figure 15: Use case 3 – Generated code
The highlighted code is the code generated by the subquery filter component, with the following characteristics:
- It is a part of the WHERE clause of the main SELECT statement.
- The attribute SRC_CUSTOMER.AGE is checked if it is equal (satisfies = condition) to the result of the subquery.
- The subquery returns the average age of the customer (AVG(SRC_CUSTOMER_AVG_AGE.AGE)).
Conclusion
The subquery filter component is not commonly used by developers, who usually find another way to get the results they want. In this article we’ve tried to familiarise you with the usage of this component, and show how, when it’s properly set up, we can implement a mapping logic which would otherwise involve joins, filters, and aggregates – a decent amount of logic can be hidden behind this single component.
Although it is not very intuitive at first, we hope this article has shown you the necessary steps to successfully use the subquery filter when creating a mapping. In return for your efforts, you’ll benefit from simpler mapping, as multiple components are replaced by only one, and easier maintenance, as instead of having to check the properties of multiple components, you check just the one.
If you’ve got any doubts or questions about ODI 12c or your own BI setup, don’t hesitate to get in touch with our expert consultants here.