02 Oct 2024 Build and Use Repository Metadata Queries in Informatica PowerCenter and Oracle Data Integrator (ODI)
When tackling a data warehousing ETL project involving numerous tables or mappings, it’s highly beneficial to have easy access to all the information related to these components.
Luckily, most integration tools can access metadata, allowing us to consult and retrieve all the necessary details. Today we’re going to focus on two such tools: Informatica PowerCenter and Oracle Data Integrator (ODI).
In this article, we will demonstrate how to use queries in these tools to extract metadata and to return useful information, such as details about the source and target tables used in each mapping.
This information can be used for multiple purposes, such as:
- Reviewing the development cycle and troubleshooting tasks.
- Preparing load plans: For projects involving lots of mappings, understanding dependencies between them is easier, improving the planning of execution orders.
- Documenting data lineage.
- Building a data model based on these metadata tables, which can then be leveraged in reports to provide insights into the development lifecycle.
- Gauging the complexity and volume of existing ETL processes when transitioning to other tools or platforms.
To accomplish these objectives, the queries can be constructed in the following way:
Informatica Repository Metadata Query
In this query, we focus primarily on the REPL_TBL_MAPPING table (which is actually a view). It provides essential metadata such as the mapping name, the subject area where it is located, and the names of the source table and target table.
We can then join this table with others, such as OPB_SRC, OPB_TARG and OPB_SUBJECT, to obtain further information regarding the source, target, and subject area respectively.
The query has the following structure:
1. Staging
For staging mappings, the query is relatively straightforward. We can obtain the mapping name, subject area, source name and target name directly from the REPL_TBL_MAPPING table, and since these mappings are loaded directly from the source model, there are normally no dependencies on previous mappings:
SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME AS TARGET_MAPPING, CASE WHEN S.OWNERNAME IS NULL THEN ‘FILE’ ELSE ‘EBS’ END AS SOURCE_ORIGIN, ‘None’ AS SOURCE_MAPPING, M.SOURCE_NAME, M.TARGET_NAME FROM REP_TBL_MAPPING M INNER JOIN OPB_SRC S ON M.SOURCE_ID = S.SRC_ID WHERE (1=1) AND M.SUBJECT_AREA IN (‘SALES_SDE’)
The data can be joined with the OPB_SRC table using the SOURCE_ID column to retrieve the owner’s name or the source model. In this example, for all rows that meet the join condition, the owner’s name will only remain blank if the data originates from a manual file. We assign the value ‘FILE’ to these cases, indicating that the data source is a file rather than a predefined model.
Finally, we can filter the data to apply this logic exclusively to the mappings within the desired subject area (in this example, the subject area is named SALES_SDE).
The result is a table that retrieves all the necessary data, displaying one row for each source used in the mapping:
SUBJECT_AREA | TARGET_MAPPING | SOURCE_ORIGIN | SOURCE_MAPPING | SOURCE_NAME | TARGET_NAME |
---|---|---|---|---|---|
SALES_SDE | SDE_COUNTRY_M | EBS | None | S_COUNTRY | W_COUNTRY_S |
SALES_SDE | SDE_REGION_M | EBS | None | S_REGION | W_REGION_S |
SALES_SDE | SDE_PROD_CAT_M | FILE | None | PROD_CAT_FILE | W_PROD_CAT_S |
SALES_SDE | SDE_PRODUCT_M | EBS | None | S_PRODUCT | W_PRODUCT_S |
2. Dimensions and Facts
Here the query becomes more complex. Mappings will now use the target tables from other mappings as their source, so it is necessary to check these dependencies and list them in the result table:
SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME AS TARGET_MAPPING, CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC) THEN M.SUBJECT_AREA ELSE F.SUBJ_NAME END AS SOURCE_ORIGIN, LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME) AS SOURCE_MAPPING, M.SOURCE_NAME, M.TARGET_NAME FROM REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.SOURCE_NAME = T.TARGET_NAME INNER JOIN OPB_SUBJECT F ON T.SUBJ_ID = F.SUBJ_ID INNER JOIN ( SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME, M.SOURCE_NAME, T.TARGET_NAME, M.SUBJECT_ID, T.SUBJ_ID FROM REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.TARGET_NAME = T.TARGET_NAME AND M.SUBJECT_ID = T.SUBJ_ID WHERE (1=1) AND T.IS_VISIBLE = 1 ) SIDC ON M.SOURCE_NAME = SIDC.TARGET_NAME AND T.SUBJ_ID = SIDC.SUBJ_ID WHERE (1=1) AND M.SUBJECT_AREA IN (‘SALES_SIDC’)
This query block is quite different from the previous one. It still retrieves the mapping name, subject area, and source and target table names from the REPL_TBL_MAPPING table, but introduces two new expressions for the source mapping and source origin:
1. The source origin can be calculated using a CASE expression which checks if the source of the mapping is located in the same subject area or not.
At this point it gets a bit tricky, because to get the source information we need to join with the OPB_TARG table:
REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.SOURCE_NAME = T.TARGET_NAME
This is necessary because, although the table will act as a source in this mapping, it was initially saved as a target when its corresponding mapping was created. In fact, only the sources from the source model can be found in the OPB_SRC table.
This is why the CASE expression first checks if the source can be found amongst all possible targets in the desired subject area (in this example we’ll call it SALES_SIDC), and if so, it assigns that subject area:
CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC) THEN M.SUBJECT_AREA ELSE F.SUBJ_NAME END AS SOURCE_ORIGIN,
On the other hand, if the table is not found in that group, it indicates that the table belongs to a different subject area, so we join with the OPB_SUBJECT table to retrieve the subject area’s name:
INNER JOIN OPB_SUBJECT F ON T.SUBJ_ID = F.SUBJ_ID
Note that we are using the SUBJ_ID column from the OPB_TARG table, as this contains the data from the source table.
Finally, we can retrieve the subject area’s name:
CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC) THEN M.SUBJECT_AREA ELSE F.SUBJ_NAME END AS SOURCE_ORIGIN,
2. The source mapping will use the LISTAGG component, which lets us display all possible mapping dependencies for each source of our mapping, separated by commas:
LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME)
But how can we get the names of the mapping dependencies if the data for the source is actually in the OPB_TARG and we don’t have that information there? This is where the following subquery comes in:
INNER JOIN ( SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME, M.SOURCE_NAME, T.TARGET_NAME, M.SUBJECT_ID, T.SUBJ_ID FROM REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.TARGET_NAME = T.TARGET_NAME AND M.SUBJECT_ID = T.SUBJ_ID WHERE (1=1) AND T.IS_VISIBLE = 1 ) SIDC ON M.SOURCE_NAME = SIDC.TARGET_NAME AND T.SUBJ_ID = SIDC.SUBJ_ID
The joins in this subquery allow us to gather data from both the REPL_TBL_MAPPING and the OPB_TARG table for each source, including, for example, all mappings where the source acts as a target. The IS_VISIBLE filter ensures that we retrieve only the latest versions, as all versions are stored in the table.
Once we have the necessary data, we can simply join it and use this information within the LISTAGG component.
If there are multiple subject areas containing Dimension or Fact mappings, the query block can be replicated for each of them. You only need to adjust the filter to retrieve data for the desired subject area.
Finally, all query blocks can be combined using UNION clauses, resulting in a single table containing the data for all the mappings in use. The query structure will look like this (in our example the only subject areas used are SALES_SDE and SALES_SIDC):
SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME AS TARGET_MAPPING, CASE WHEN S.OWNERNAME IS NULL THEN ‘FILE’ ELSE ‘EBS’ END AS SOURCE_ORIGIN, ‘None’ AS SOURCE_MAPPING, M.SOURCE_NAME, M.TARGET_NAME FROM REP_TBL_MAPPING M INNER JOIN OPB_SRC S ON M.SOURCE_ID = S.SRC_ID WHERE (1=1) AND M.SUBJECT_AREA IN (‘SALES_SDE’) UNION SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME AS TARGET_MAPPING, CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC) THEN M.SUBJECT_AREA ELSE F.SUBJ_NAME END AS SOURCE_ORIGIN, LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME) AS SOURCE_MAPPING, M.SOURCE_NAME, M.TARGET_NAME FROM REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.SOURCE_NAME = T.TARGET_NAME INNER JOIN OPB_SUBJECT F ON T.SUBJ_ID = F.SUBJ_ID INNER JOIN ( SELECT DISTINCT M.SUBJECT_AREA, M.MAPPING_NAME, M.SOURCE_NAME, T.TARGET_NAME, M.SUBJECT_ID, T.SUBJ_ID FROM REP_TBL_MAPPING M INNER JOIN OPB_TARG T ON M.TARGET_NAME = T.TARGET_NAME AND M.SUBJECT_ID = T.SUBJ_ID WHERE (1=1) AND T.IS_VISIBLE = 1 ) SIDC ON M.SOURCE_NAME = SIDC.TARGET_NAME AND T.SUBJ_ID = SIDC.SUBJ_ID WHERE (1=1) AND M.SUBJECT_AREA IN (‘SALES_SIDC’)
The resulting table looks like this:
SUBJECT_AREA | TARGET_MAPPING | SOURCE_ORIGIN | SOURCE_MAPPING | SOURCE_NAME | TARGET_NAME |
---|---|---|---|---|---|
SALES_SDE | SDE_COUNTRY_M | EBS | None | S_COUNTRY | W_COUNTRY_S |
SALES_SDE | SDE_REGION_M | EBS | None | S_REGION | W_REGION_S |
SALES_SDE | SDE_PROD_CAT_M | FILE | None | PROD_CAT_FILE | W_PROD_CAT_S |
SALES_SDE | SDE_PRODUCT_M | EBS | None | S_PRODUCT | W_PRODUCT_S |
SALES_SIDC | SIDC_COUNTRY_M | SALES_SDE | SDE_COUNTRY_M | W_COUNTRY_S | W_COUNTRY_D |
SALES_SIDC | SIDC_REGION_M | SALES_SDE | SDE_REGION_M | W_REGION_S | W_REGION_D |
SALES_SIDC | SIDC_PROD_CAT_M | SALES_SDE | SDE_PROD_CAT_M | W_PROD_CAT_S | W_PROD_CAT_D |
SALES_SIDC | SIDC_PRODUCT_M | SALES_SDE | SDE_PRODUCT_M | W_PRODUCT_S | W_PRODUCT_D |
SALES_SIDC | SIDC_SALES_M | ESALES_SDEBS | SIDC_COUNTRY_M | W_COUNTRY_D | W_SALES_F |
SALES_SIDC | SIDC_SALES_M | SALES_SDE | SIDC_REGION_M | W_REGION_D | W_SALES_F |
SALES_SIDC | SIDC_SALES_M | SALES_SDE | SIDC_PRODUCT_M | W_PRODUCT_D | W_SALES_F |
SALES_SIDC | SIDC_REVENUE_M | SALES_SDE | SIDC_PROD_CAT_M | W_PROD_CAT_D | W_REVENUE_F |
SALES_SIDC | SIDC_REVENUE_M | SALES_SDE | SIDC_COUNTRY_M, SIDC_REGION_M, SIDC_PRODUCT_M | W_SALES_F | W_REVENUE_F |
ODI Metadata Query
When working with ODI, the process to obtain mapping data (name, sources and targets) is rather different from Informatica PowerCenter, mainly because the metadata is much more focused on mapping components in ODI.
In this example, we will also include the ODI project name, as well as the names of the different folders, which serve a similar function to subject areas in Informatica. To do so, we can start by querying the Project table (SNP_PROJECT). Then, by joining with the Folders table (SNP_FOLDER), we can obtain the names of the different folders within the project:
SNP_PROJECT PROJECT LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT
If there are subfolders inside the main project folders, we can retrieve them in the same way, simply by adding a join for each additional level of depth.
Now that we have both the project and folder names, the next step is to obtain the data from the mappings: we will use the Mapping table (SNP_MAPPING). We also need information about the various mapping components, especially the datastores that will load data from the sources into the targets. We can do this by joining the SNP_MAP_COMP table as well:
JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING
Now we have all the components in the mapping. However, as we are only interested in the source and target datastores, we need to join the Connector Point table (SNP_MAP_CP) to obtain the connection point for each component and whether it is an INPUT or OUTPUT:
INNER JOIN SNP_MAP_CP CP ON MC.I_MAP_COMP = CP.I_OWNER_MAP_COMP
Whilst this alone may not provide all the necessary details, it sets the stage for retrieving the required data. By joining the References table (SNP_MAP_REF), which contains references to various attributes about the mapping and its components, we can obtain further information:
INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD
This also allows us to access the Models table (SNP_MODEL) for the model name, and the Tables table (SNP_TABLE), for the source or target table name.
Returning to the connector points, datastores, like any other component, have both an input and output. We can filter these categories to limit the result set:
CP.DIRECTION = 'I' --INPUT CONNECTION POINT CP.DIRECTION = 'O' --OUTPUT CONNECTION POINT
Lastly, by joining the Connections table (SNP_MAP_CONN), we can restrict the query to components with either only an output (target table) or only an input (source table):
LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_END_MAP_CP CONN.I_START_MAP_CP IS NULL CONN.I_END_MAP_CP
This way, we can identify datastores with only an active input connector (target table) or only an active output connector (source table). Now we can construct two queries, one for the source table:
WITH SOURCE AS ( SELECT DISTINCT PROJECT.PROJECT_NAME PROJECT, FOLDER_LVL1.FOLDER_NAME LVL1_FOLDER, M.NAME MAPPING_NAME, MDL.COD_MOD SOURCE_MODEL_NAME, T.TABLE_NAME SOURCE_TABLE FROM SNP_PROJECT PROJECT LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_END_MAP_CP WHERE CP.DIRECTION = 'I' AND --INPUT CONNECTION POINT CONN.I_START_MAP_CP IS NULL AND PROJECT.PROJECT_NAME = ‘PROJ_1’ AND FOLDER_LVL1.FOLDER_NAME IN (‘SALES_SDE’, ‘SALES_SIDC’) ORDER BY 4 ),
And one for the target table:
TARGET AS ( SELECT DISTINCT PROJECT.PROJECT_NAME PROJECT, FOLDER_LVL1.FOLDER_NAME LVL1_FOLDER, M.NAME MAPPING_NAME, MDL.COD_MOD TARGET_MODEL_NAME, T.TABLE_NAME TARGET_TABLE FROM SNP_PROJECT PROJECT LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_START_MAP_CP WHERE CP.DIRECTION = 'O' AND --OUTPUT CONNECTION POINT CONN.I_END_MAP_CP IS NULL AND PROJECT.PROJECT_NAME = ‘PROJ_1’ AND FOLDER_LVL1.FOLDER_NAME IN (‘SALES_SDE’, ‘SALES_SIDC’) ORDER BY 4 )
In addition to filtering by connector points, we’ve applied filters for the project and folder to ensure that we are retrieving data only from our project PROJ_1 and its two folders, SALES_SDE and SALES_SIDC.
We can then combine these two queries to produce the final result:
SELECT SOURCE.PROJECT, SOURCE.LVL1_FOLDER, SOURCE.MAPPING_NAME, SOURCE.SOURCE_MODEL_NAME, SOURCE.SOURCE_TABLE, TARGET.TARGET_MODEL_NAME, TARGET.TARGET_TABLE FROM SOURCE, TARGET WHERE SOURCE.MAPPING_NAME = TARGET.MAPPING_NAME ORDER BY LVL1_FOLDER, MAPPING_NAME ;
The resulting table will look like this:
PROJECT | LVL1_FOLDER | MAPPING_NAME | SOURCE_MODEL_NAME | SOURCE_TABLE | TARGET_MODEL_NAME | TARGET_TABLE |
---|---|---|---|---|---|---|
PROJ_1 | SALES_SDE | SDE_COUNTRY_M | EBS | S_COUNTRY | SALES_SDE | W_COUNTRY_S |
PROJ_1 | SALES_SDE | SDE_REGION_M | EBS | S_REGION | SALES_SDE | W_REGION_S |
PROJ_1 | SALES_SDE | SDE_PROD_CAT_M | FILE | PROD_CAT_FILE | SALES_SDE | W_PROD_CAT_S |
PROJ_1 | SALES_SDE | SDE_PRODUCT_M | EBS | S_PRODUCT | SALES_SDE | W_PRODUCT_S |
PROJ_1 | SIDC_COUNTRY_M | SALES_SDE | SDE_COUNTRY_M | W_COUNTRY_S | SALES_SIDC | W_COUNTRY_D |
PROJ_1 | SIDC_REGION_M | SALES_SDE | SDE_REGION_M | W_REGION_S | SALES_SIDC | W_REGION_D |
PROJ_1 | SIDC_PROD_CAT_M | SALES_SDE | SDE_PROD_CAT_M | W_PROD_CAT_S | SALES_SIDC | W_PROD_CAT_D |
PROJ_1 | SIDC_PRODUCT_M | SALES_SDE | SDE_PRODUCT_M | W_PRODUCT_S | SALES_SIDC | W_PRODUCT_D |
PROJ_1 | SIDC_SALES_M | SALES_SIDC | SIDC_COUNTRY_M | W_COUNTRY_D | SALES_SIDC | W_SALES_F |
PROJ_1 | SIDC_SALES_M | SALES_SIDC | SIDC_REGION_M | W_REGION_D | SALES_SIDC | W_SALES_F |
PROJ_1 | SIDC_SALES_M | SALES_SIDC | SIDC_PRODUCT_M | W_PRODUCT_D | SALES_SIDC | W_SALES_F |
PROJ_1 | SIDC_REVENUE_M | SALES_SIDC | SIDC_PROD_CAT_M | W_PROD_CAT_D | W_REVENUE_F | W_REVENUE_F |
PROJ_1 | SIDC_REVENUE_M | SALES_SIDC | SIDC_COUNTRY | W_SALES_F | SALES_SIDC | W_REVENUE_F |
Conclusion
Access to metadata is a crucial feature of any ETL tool, enabling deeper insights and control over data flows. The query presented here is just one of many possible ways to utilise metadata effectively. It can be customised to meet the specific needs of different projects, offering flexibility and precision in managing data.
However, bear in mind that access to the schema containing these metadata tables may not be granted to developers by default. Permissions are typically controlled based on roles and responsibilities, so it is important to coordinate with administrators to obtain the necessary access.
Are you looking to enhance your metadata management and streamline your ETL processes? At ClearPeaks, our team of experts can help you unlock the full potential of your data architecture. Whether it’s setting up efficient metadata queries or optimising your data integration tools, we are here to assist. Contact us today to find out how we can help you achieve your data goals!