Build and Use Repository Metadata Queries in Informatica PowerCenter and Oracle Data Integrator (ODI)

02 Oct 2024 Build and Use Repository Metadata Queries in Informatica PowerCenter and Oracle Data Integrator (ODI)

Posted at 00:00h in Enterprise BI, ETL, Informatica, Oracle by Carlos M

When tackling a data warehousing ETL project involving numerous tables or mappings, it’s highly beneficial to have easy access to all the information related to these components.

Luckily, most integration tools can access metadata, allowing us to consult and retrieve all the necessary details. Today we’re going to focus on two such tools: Informatica PowerCenter and Oracle Data Integrator (ODI).

In this article, we will demonstrate how to use queries in these tools to extract metadata and to return useful information, such as details about the source and target tables used in each mapping.

This information can be used for multiple purposes, such as:

Reviewing the development cycle and troubleshooting tasks.
Preparing load plans: For projects involving lots of mappings, understanding dependencies between them is easier, improving the planning of execution orders.
Documenting data lineage.
Building a data model based on these metadata tables, which can then be leveraged in reports to provide insights into the development lifecycle.
Gauging the complexity and volume of existing ETL processes when transitioning to other tools or platforms.

To accomplish these objectives, the queries can be constructed in the following way:

Informatica Repository Metadata Query

In this query, we focus primarily on the REPL_TBL_MAPPING table (which is actually a view). It provides essential metadata such as the mapping name, the subject area where it is located, and the names of the source table and target table.

We can then join this table with others, such as OPB_SRC, OPB_TARG and OPB_SUBJECT, to obtain further information regarding the source, target, and subject area respectively.

The query has the following structure:

1. Staging

For staging mappings, the query is relatively straightforward. We can obtain the mapping name, subject area, source name and target name directly from the REPL_TBL_MAPPING table, and since these mappings are loaded directly from the source model, there are normally no dependencies on previous mappings:

SELECT
        DISTINCT M.SUBJECT_AREA,
        M.MAPPING_NAME AS TARGET_MAPPING,
        CASE WHEN S.OWNERNAME IS NULL THEN ‘FILE’ ELSE ‘EBS’ END AS SOURCE_ORIGIN,
        ‘None’ AS SOURCE_MAPPING,
        M.SOURCE_NAME,
        M.TARGET_NAME
FROM
	REP_TBL_MAPPING M
	INNER JOIN OPB_SRC S
	ON M.SOURCE_ID = S.SRC_ID
WHERE (1=1)
	AND M.SUBJECT_AREA IN (‘SALES_SDE’)

The data can be joined with the OPB_SRC table using the SOURCE_ID column to retrieve the owner’s name or the source model. In this example, for all rows that meet the join condition, the owner’s name will only remain blank if the data originates from a manual file. We assign the value ‘FILE’ to these cases, indicating that the data source is a file rather than a predefined model.

Finally, we can filter the data to apply this logic exclusively to the mappings within the desired subject area (in this example, the subject area is named SALES_SDE).

The result is a table that retrieves all the necessary data, displaying one row for each source used in the mapping:

SUBJECT_AREA	TARGET_MAPPING	SOURCE_ORIGIN	SOURCE_MAPPING	SOURCE_NAME	TARGET_NAME
SALES_SDE	SDE_COUNTRY_M	EBS	None	S_COUNTRY	W_COUNTRY_S
SALES_SDE	SDE_REGION_M	EBS	None	S_REGION	W_REGION_S
SALES_SDE	SDE_PROD_CAT_M	FILE	None	PROD_CAT_FILE	W_PROD_CAT_S
SALES_SDE	SDE_PRODUCT_M	EBS	None	S_PRODUCT	W_PRODUCT_S

2. Dimensions and Facts

Here the query becomes more complex. Mappings will now use the target tables from other mappings as their source, so it is necessary to check these dependencies and list them in the result table:

SELECT
        DISTINCT M.SUBJECT_AREA,
        M.MAPPING_NAME AS TARGET_MAPPING,
        CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC)
             THEN M.SUBJECT_AREA
             ELSE F.SUBJ_NAME
        END AS SOURCE_ORIGIN,
        LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) 
        WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME) AS SOURCE_MAPPING,
        M.SOURCE_NAME,
        M.TARGET_NAME
FROM
	REP_TBL_MAPPING M
        INNER JOIN OPB_TARG T
        ON M.SOURCE_NAME = T.TARGET_NAME
        INNER JOIN OPB_SUBJECT F
        ON T.SUBJ_ID = F.SUBJ_ID
        INNER JOIN (
             SELECT
                  DISTINCT M.SUBJECT_AREA,
                  M.MAPPING_NAME,
                  M.SOURCE_NAME,
                  T.TARGET_NAME,
                  M.SUBJECT_ID,
                  T.SUBJ_ID
             FROM
                  REP_TBL_MAPPING M
                  INNER JOIN OPB_TARG T
                  ON M.TARGET_NAME = T.TARGET_NAME
                  AND M.SUBJECT_ID = T.SUBJ_ID
             WHERE (1=1)
                  AND T.IS_VISIBLE = 1
        ) SIDC
        ON M.SOURCE_NAME = SIDC.TARGET_NAME
        AND T.SUBJ_ID = SIDC.SUBJ_ID
WHERE (1=1)
        AND M.SUBJECT_AREA IN (‘SALES_SIDC’)

This query block is quite different from the previous one. It still retrieves the mapping name, subject area, and source and target table names from the REPL_TBL_MAPPING table, but introduces two new expressions for the source mapping and source origin:

1. The source origin can be calculated using a CASE expression which checks if the source of the mapping is located in the same subject area or not.

At this point it gets a bit tricky, because to get the source information we need to join with the OPB_TARG table:

REP_TBL_MAPPING M
INNER JOIN OPB_TARG T
ON M.SOURCE_NAME = T.TARGET_NAME

This is necessary because, although the table will act as a source in this mapping, it was initially saved as a target when its corresponding mapping was created. In fact, only the sources from the source model can be found in the OPB_SRC table.

This is why the CASE expression first checks if the source can be found amongst all possible targets in the desired subject area (in this example we’ll call it SALES_SIDC), and if so, it assigns that subject area:

CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC)
        THEN M.SUBJECT_AREA
        ELSE F.SUBJ_NAME
END AS SOURCE_ORIGIN,

On the other hand, if the table is not found in that group, it indicates that the table belongs to a different subject area, so we join with the OPB_SUBJECT table to retrieve the subject area’s name:

INNER JOIN OPB_SUBJECT F
ON T.SUBJ_ID = F.SUBJ_ID

Note that we are using the SUBJ_ID column from the OPB_TARG table, as this contains the data from the source table.

Finally, we can retrieve the subject area’s name:

CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC)
       THEN M.SUBJECT_AREA
       ELSE F.SUBJ_NAME
END AS SOURCE_ORIGIN,

2. The source mapping will use the LISTAGG component, which lets us display all possible mapping dependencies for each source of our mapping, separated by commas:

LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) 
WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME)

But how can we get the names of the mapping dependencies if the data for the source is actually in the OPB_TARG and we don’t have that information there? This is where the following subquery comes in:

INNER JOIN (
        SELECT
             DISTINCT M.SUBJECT_AREA,
             M.MAPPING_NAME,
             M.SOURCE_NAME,
             T.TARGET_NAME,
             M.SUBJECT_ID,
             T.SUBJ_ID
        FROM
             REP_TBL_MAPPING M
             INNER JOIN OPB_TARG T
             ON M.TARGET_NAME = T.TARGET_NAME
             AND M.SUBJECT_ID = T.SUBJ_ID
        WHERE (1=1)
             AND T.IS_VISIBLE = 1
) SIDC
ON M.SOURCE_NAME = SIDC.TARGET_NAME
AND T.SUBJ_ID = SIDC.SUBJ_ID

The joins in this subquery allow us to gather data from both the REPL_TBL_MAPPING and the OPB_TARG table for each source, including, for example, all mappings where the source acts as a target. The IS_VISIBLE filter ensures that we retrieve only the latest versions, as all versions are stored in the table.

Once we have the necessary data, we can simply join it and use this information within the LISTAGG component.

If there are multiple subject areas containing Dimension or Fact mappings, the query block can be replicated for each of them. You only need to adjust the filter to retrieve data for the desired subject area.

Finally, all query blocks can be combined using UNION clauses, resulting in a single table containing the data for all the mappings in use. The query structure will look like this (in our example the only subject areas used are SALES_SDE and SALES_SIDC):

SELECT
        DISTINCT M.SUBJECT_AREA,
        M.MAPPING_NAME AS TARGET_MAPPING,
        CASE WHEN S.OWNERNAME IS NULL THEN ‘FILE’ ELSE ‘EBS’ END AS SOURCE_ORIGIN,
        ‘None’ AS SOURCE_MAPPING,
        M.SOURCE_NAME,
        M.TARGET_NAME
FROM
	REP_TBL_MAPPING M
	INNER JOIN OPB_SRC S
	ON M.SOURCE_ID = S.SRC_ID
WHERE (1=1)
	AND M.SUBJECT_AREA IN (‘SALES_SDE’)
UNION
SELECT
        DISTINCT M.SUBJECT_AREA,
        M.MAPPING_NAME AS TARGET_MAPPING,
        CASE WHEN M.SOURCE_NAME IN (SELECT TARGET_NAME FROM REP_TBL_MAPPING WHERE SUBJECT_AREA = SALES_SIDC)
             THEN M.SUBJECT_AREA
             ELSE F.SUBJ_NAME
        END AS SOURCE_ORIGIN,
        LISTAGG (DISTINCT SIDC.MAPPING_NAME, ‘, ‘) 
        WITHIN GROUP (ORDER BY SIDC.MAPPING_NAME) OVER (PARTITION BY M.SOURCE_NAME) AS SOURCE_MAPPING,
        M.SOURCE_NAME,
        M.TARGET_NAME
FROM
	REP_TBL_MAPPING M
        INNER JOIN OPB_TARG T
        ON M.SOURCE_NAME = T.TARGET_NAME
        INNER JOIN OPB_SUBJECT F
        ON T.SUBJ_ID = F.SUBJ_ID
        INNER JOIN (
              SELECT
                  DISTINCT M.SUBJECT_AREA,
                  M.MAPPING_NAME,
                  M.SOURCE_NAME,
                  T.TARGET_NAME,
                  M.SUBJECT_ID,
                  T.SUBJ_ID
              FROM
                  REP_TBL_MAPPING M
                  INNER JOIN OPB_TARG T
                  ON M.TARGET_NAME = T.TARGET_NAME
                  AND M.SUBJECT_ID = T.SUBJ_ID
              WHERE (1=1)
                  AND T.IS_VISIBLE = 1
        ) SIDC
        ON M.SOURCE_NAME = SIDC.TARGET_NAME
        AND T.SUBJ_ID = SIDC.SUBJ_ID
WHERE (1=1)
AND M.SUBJECT_AREA IN (‘SALES_SIDC’)

The resulting table looks like this:

SUBJECT_AREA	TARGET_MAPPING	SOURCE_ORIGIN	SOURCE_MAPPING	SOURCE_NAME	TARGET_NAME
SALES_SDE	SDE_COUNTRY_M	EBS	None	S_COUNTRY	W_COUNTRY_S
SALES_SDE	SDE_REGION_M	EBS	None	S_REGION	W_REGION_S
SALES_SDE	SDE_PROD_CAT_M	FILE	None	PROD_CAT_FILE	W_PROD_CAT_S
SALES_SDE	SDE_PRODUCT_M	EBS	None	S_PRODUCT	W_PRODUCT_S
SALES_SIDC	SIDC_COUNTRY_M	SALES_SDE	SDE_COUNTRY_M	W_COUNTRY_S	W_COUNTRY_D
SALES_SIDC	SIDC_REGION_M	SALES_SDE	SDE_REGION_M	W_REGION_S	W_REGION_D
SALES_SIDC	SIDC_PROD_CAT_M	SALES_SDE	SDE_PROD_CAT_M	W_PROD_CAT_S	W_PROD_CAT_D
SALES_SIDC	SIDC_PRODUCT_M	SALES_SDE	SDE_PRODUCT_M	W_PRODUCT_S	W_PRODUCT_D
SALES_SIDC	SIDC_SALES_M	ESALES_SDEBS	SIDC_COUNTRY_M	W_COUNTRY_D	W_SALES_F
SALES_SIDC	SIDC_SALES_M	SALES_SDE	SIDC_REGION_M	W_REGION_D	W_SALES_F
SALES_SIDC	SIDC_SALES_M	SALES_SDE	SIDC_PRODUCT_M	W_PRODUCT_D	W_SALES_F
SALES_SIDC	SIDC_REVENUE_M	SALES_SDE	SIDC_PROD_CAT_M	W_PROD_CAT_D	W_REVENUE_F
SALES_SIDC	SIDC_REVENUE_M	SALES_SDE	SIDC_COUNTRY_M, SIDC_REGION_M, SIDC_PRODUCT_M	W_SALES_F	W_REVENUE_F

ODI Metadata Query

When working with ODI, the process to obtain mapping data (name, sources and targets) is rather different from Informatica PowerCenter, mainly because the metadata is much more focused on mapping components in ODI.

In this example, we will also include the ODI project name, as well as the names of the different folders, which serve a similar function to subject areas in Informatica. To do so, we can start by querying the Project table (SNP_PROJECT). Then, by joining with the Folders table (SNP_FOLDER), we can obtain the names of the different folders within the project:

      SNP_PROJECT PROJECT
      LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 
ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT

If there are subfolders inside the main project folders, we can retrieve them in the same way, simply by adding a join for each additional level of depth.

Now that we have both the project and folder names, the next step is to obtain the data from the mappings: we will use the Mapping table (SNP_MAPPING). We also need information about the various mapping components, especially the datastores that will load data from the sources into the targets. We can do this by joining the SNP_MAP_COMP table as well:

      JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER
      INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING

Now we have all the components in the mapping. However, as we are only interested in the source and target datastores, we need to join the Connector Point table (SNP_MAP_CP) to obtain the connection point for each component and whether it is an INPUT or OUTPUT:

      INNER JOIN SNP_MAP_CP CP ON MC.I_MAP_COMP = CP.I_OWNER_MAP_COMP

Whilst this alone may not provide all the necessary details, it sets the stage for retrieving the required data. By joining the References table (SNP_MAP_REF), which contains references to various attributes about the mapping and its components, we can obtain further information:

     INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF
     INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE
     INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD

This also allows us to access the Models table (SNP_MODEL) for the model name, and the Tables table (SNP_TABLE), for the source or target table name.

Returning to the connector points, datastores, like any other component, have both an input and output. We can filter these categories to limit the result set:

    CP.DIRECTION = 'I' --INPUT CONNECTION POINT
    CP.DIRECTION = 'O' --OUTPUT CONNECTION POINT

Lastly, by joining the Connections table (SNP_MAP_CONN), we can restrict the query to components with either only an output (target table) or only an input (source table):

    LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_END_MAP_CP
    CONN.I_START_MAP_CP IS NULL
    CONN.I_END_MAP_CP

This way, we can identify datastores with only an active input connector (target table) or only an active output connector (source table). Now we can construct two queries, one for the source table:

WITH
SOURCE AS (
       SELECT
            DISTINCT
            PROJECT.PROJECT_NAME PROJECT,
            FOLDER_LVL1.FOLDER_NAME LVL1_FOLDER,
            M.NAME MAPPING_NAME,
            MDL.COD_MOD SOURCE_MODEL_NAME,
            T.TABLE_NAME SOURCE_TABLE
       FROM
            SNP_PROJECT PROJECT
            LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT
            JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER
            INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING
            INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF
            INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE
            INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD
            LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_END_MAP_CP
       WHERE
	    CP.DIRECTION = 'I' AND --INPUT CONNECTION POINT
            CONN.I_START_MAP_CP IS NULL
            AND PROJECT.PROJECT_NAME = ‘PROJ_1’
            AND FOLDER_LVL1.FOLDER_NAME IN (‘SALES_SDE’, ‘SALES_SIDC’)
       ORDER BY 4
),

And one for the target table:

TARGET AS (
       SELECT
            DISTINCT
            PROJECT.PROJECT_NAME PROJECT,
            FOLDER_LVL1.FOLDER_NAME LVL1_FOLDER,
            M.NAME MAPPING_NAME,
            MDL.COD_MOD TARGET_MODEL_NAME,
            T.TABLE_NAME TARGET_TABLE
       FROM
            SNP_PROJECT PROJECT
            LEFT OUTER JOIN SNP_FOLDER FOLDER_LVL1 ON FOLDER_LVL1.I_PROJECT = PROJECT.I_PROJECT
            JOIN SNP_MAPPING M ON M.I_FOLDER = FOLDER_LVL1.I_FOLDER
            INNER JOIN SNP_MAP_COMP MC ON M.I_MAPPING = MC.I_OWNER_MAPPING
            INNER JOIN SNP_MAP_REF MR ON MC.I_MAP_REF = MR.I_MAP_REF
            INNER JOIN SNP_TABLE T ON MR.I_REF_ID = T.I_TABLE
            INNER JOIN SNP_MODEL MDL ON T.I_MOD = MDL.I_MOD
            LEFT JOIN SNP_MAP_CONN CONN ON CP.I_MAP_CP = CONN.I_START_MAP_CP
       WHERE
	    CP.DIRECTION = 'O' AND --OUTPUT CONNECTION POINT
            CONN.I_END_MAP_CP IS NULL
            AND PROJECT.PROJECT_NAME = ‘PROJ_1’
            AND FOLDER_LVL1.FOLDER_NAME IN (‘SALES_SDE’, ‘SALES_SIDC’)
ORDER BY 4

)

In addition to filtering by connector points, we’ve applied filters for the project and folder to ensure that we are retrieving data only from our project PROJ_1 and its two folders, SALES_SDE and SALES_SIDC.

We can then combine these two queries to produce the final result:

SELECT
      SOURCE.PROJECT,
      SOURCE.LVL1_FOLDER,
      SOURCE.MAPPING_NAME,
      SOURCE.SOURCE_MODEL_NAME,
      SOURCE.SOURCE_TABLE,
      TARGET.TARGET_MODEL_NAME,
      TARGET.TARGET_TABLE
FROM
      SOURCE,
      TARGET
WHERE
      SOURCE.MAPPING_NAME = TARGET.MAPPING_NAME
ORDER BY 
      LVL1_FOLDER, MAPPING_NAME
;

The resulting table will look like this:

PROJECT	LVL1_FOLDER	MAPPING_NAME	SOURCE_MODEL_NAME	SOURCE_TABLE	TARGET_MODEL_NAME	TARGET_TABLE
PROJ_1	SALES_SDE	SDE_COUNTRY_M	EBS	S_COUNTRY	SALES_SDE	W_COUNTRY_S
PROJ_1	SALES_SDE	SDE_REGION_M	EBS	S_REGION	SALES_SDE	W_REGION_S
PROJ_1	SALES_SDE	SDE_PROD_CAT_M	FILE	PROD_CAT_FILE	SALES_SDE	W_PROD_CAT_S
PROJ_1	SALES_SDE	SDE_PRODUCT_M	EBS	S_PRODUCT	SALES_SDE	W_PRODUCT_S
PROJ_1	SIDC_COUNTRY_M	SALES_SDE	SDE_COUNTRY_M	W_COUNTRY_S	SALES_SIDC	W_COUNTRY_D
PROJ_1	SIDC_REGION_M	SALES_SDE	SDE_REGION_M	W_REGION_S	SALES_SIDC	W_REGION_D
PROJ_1	SIDC_PROD_CAT_M	SALES_SDE	SDE_PROD_CAT_M	W_PROD_CAT_S	SALES_SIDC	W_PROD_CAT_D
PROJ_1	SIDC_PRODUCT_M	SALES_SDE	SDE_PRODUCT_M	W_PRODUCT_S	SALES_SIDC	W_PRODUCT_D
PROJ_1	SIDC_SALES_M	SALES_SIDC	SIDC_COUNTRY_M	W_COUNTRY_D	SALES_SIDC	W_SALES_F
PROJ_1	SIDC_SALES_M	SALES_SIDC	SIDC_REGION_M	W_REGION_D	SALES_SIDC	W_SALES_F
PROJ_1	SIDC_SALES_M	SALES_SIDC	SIDC_PRODUCT_M	W_PRODUCT_D	SALES_SIDC	W_SALES_F
PROJ_1	SIDC_REVENUE_M	SALES_SIDC	SIDC_PROD_CAT_M	W_PROD_CAT_D	W_REVENUE_F	W_REVENUE_F
PROJ_1	SIDC_REVENUE_M	SALES_SIDC	SIDC_COUNTRY	W_SALES_F	SALES_SIDC	W_REVENUE_F

Conclusion

Access to metadata is a crucial feature of any ETL tool, enabling deeper insights and control over data flows. The query presented here is just one of many possible ways to utilise metadata effectively. It can be customised to meet the specific needs of different projects, offering flexibility and precision in managing data.

However, bear in mind that access to the schema containing these metadata tables may not be granted to developers by default. Permissions are typically controlled based on roles and responsibilities, so it is important to coordinate with administrators to obtain the necessary access.

Are you looking to enhance your metadata management and streamline your ETL processes? At ClearPeaks, our team of experts can help you unlock the full potential of your data architecture. Whether it’s setting up efficient metadata queries or optimising your data integration tools, we are here to assist. Contact us today to find out how we can help you achieve your data goals!

Carlos M

carlos.megia@clearpeaks.com

02 Oct 2024 Build and Use Repository Metadata Queries in Informatica PowerCenter and Oracle Data Integrator (ODI)

Informatica Repository Metadata Query

1. Staging

SUBJECT_AREA

SOURCE_ORIGIN

SOURCE_MAPPING

SOURCE_NAME

TARGET_NAME

2. Dimensions and Facts

SUBJECT_AREA

SOURCE_ORIGIN

SOURCE_MAPPING

SOURCE_NAME

TARGET_NAME

ODI Metadata Query

PROJECT

MAPPING_NAME

SOURCE_MODEL_NAME

SOURCE_TABLE

TARGET_MODEL_NAME

TARGET_TABLE

Conclusion

Carlos M

More Links

From Our Blog