17 May 2017 EXTRACTING DATA FROM TALEO
Over the past few years we have seen companies focusing more and more on Human Resources / Human Capital activities. This is no surprise, considering that nowadays a large number of businesses depend more on people skills and creativity than on machinery or capital, so hiring the right people has become a critical process. As a consequence, more and more emphasis is put on having the right software to support HR/HC activities, and this in turn leads to the necessity of building a BI solution on top of those systems for a correct evaluation of processes and resources. One of the most commonly used HR systems is Taleo, an Oracle product that resides in the cloud, so there is no direct access to its underlying data. Nevertheless, most BI systems are still on-premise, so if we want to use Taleo data, we need to extract it from the cloud first.
1. Taleo data extraction methods
As mentioned before, there is no way of direct access to Taleo data; nevertheless, there are several ways to extract it, and once extracted we will be able to use it in the BI solution:
• | Querying Taleo API |
• | Using Cloud Connector |
• | Using Taleo Connect Client |
API is very robust, but the most complex of the methods, since it requires a separate application to be written. Usually, depending on the configuration of the BI system, either Oracle Cloud Connector, Taleo Connect Client or a combination of both is used.
Figure 1: Cloud connector in ODI objects tree |
Oracle Cloud Connector is a component of OBI Apps, and essentially it’s Java code that replicates Taleo entities / tables. It’s also easy to use: just by creating any Load Plan in BIACM using Taleo as the source system, a series of calls to Cloud Connector are generated that effectively replicate Taleo tables to local schema. Although it works well, it has 2 significant disadvantages:
• | It’s only available as a component of BI Apps |
• | It doesn’t extract Taleo UDFs |
So even if we have BI Apps installed and we use Cloud Connector, there will be some columns (UDFs) that will not get extracted. This is why the use of Taleo Connect Client is often a must.
2. Taleo Connect Client
Taleo Connect Client is a tool that is used to export or import data from / to Taleo. In this article we’re going to focus on extraction. It can extract any field, including UDFs, so it can be used in combination with BI Apps Cloud Connector or, if that’s not available, then as a unique extraction tool. There are versions for both Windows and Linux operating systems. Let’s look at the Windows version first.
Part 1 – Installation & running:
Taleo Connect Client can be downloaded from the Oracle e-delivery website; just type Taleo Connect Client into the searcher and you will see it on the list. Choose the necessary version, select Application Installer and Application Data Model (required!), remembering that it must match the version of the Taleo application you will be working with; then just download and install. Important – the Data Model must be installed before the application is installed.
Figure 2: Downloading TCC for Windows |
After TCC is installed, we run it, providing the necessary credentials in the initial screen:
Figure 3: Taleo Connect Client welcome screen |
And then, after clicking on ‘ping’, we connect to Taleo. The window that we see is initially empty, but we can create or execute new extracts from it. But before going on to this step, let’s find out how to see the UDFs: in the right panel, go to the ‘Product integration pack’ tab, also selecting the correct product and model. Then, in the Entities tab, we can see a list of entities / tables, and in fields / relations, we can see columns and relations with other entities / tables (through foreign keys). After the first run, you will probably have some UDFs that are not on the list of fields / relations available. Why is this? Because what we initially see in the field list are only Taleo out-of-the-box fields, installed with the Data Model installer. But don’t worry, this can easily be fixed: use the ‘Synchronize custom fields’ icon (highlighted on the screenshot). After clicking on it you will be taken to a log-on screen where you’ll have to provide log-on credentials again, and after clicking on the ‘Synchronize’ button, the UDFs will be retrieved.
Figure 4: Synchronizing out-of-the-box model with User Defined Fields |
Figure 5: Synchronized list of fields, including some UDFs (marked with ‘person’ icon) |
Part 2 – Preparing the extract:
Once we have all the required fields available, preparing the extract is pretty straightforward. Go to File->New->New Export Wizard, then choose the desired Entity, and click on Finish. Now, in the General window, set Export Mode to ‘CSV-Entity’, and in the Projections tab, select the columns that you want to extract by dragging and dropping them from the Entity->Structure window on the right. You can also add filters or sort the result set. Finally, save the export file. The other component necessary to actually extract the data is so-called configuration. To create it, we select File->New->New Configuration Wizard, then we point the export file that we’ve created in the previous step and, in the subsequent step, our endpoint (the Taleo instance that we will extract the data from). Then, on the following screen, there are more extract parameters, like request format and encoding, extract file name format and encoding and much more. In most cases, using the default values of parameters will let us extract the data successfully, so unless it’s clearly required, there is no need to change anything. So now the configuration file can be saved and the extraction process can start, just by clicking on the ‘Execute the configuration’ button (on the toolbar just below the main menu). If the extraction is successful, then all the indicators in the Monitoring window on the right will turn green, as in the screenshot below.
Figure 6: TCC successfull extraction |
By using a bat file created during the installation, you can schedule TCC jobs to be executed on a timely basis, using Windows Scheduler, but it’s much more common to have your OBI / BI Apps (or almost any other DBMS that your organization uses as a data warehouse) installed on a Linux / Unix server. This is why we’re going to have a look at how to install and set up TCC in a Linux environment.
Part 3 – TCC in a Linux / Unix environment:
TCC setup in a Linux / Unix environment is a bit more complex. To simplify it, we will use some of the components that were already created and used when we worked with Windows TCC, and although the frontend of the application is totally different (to be precise, there is no frontend at all in the Linux version as it’s strictly command-line), the way the data is extracted from Taleo is exactly the same (using extracts designed as XML files and Taleo APIs). So, after downloading the application installer and data model from edelivery.oracle.com , we install both components. Installation is actually just extracting the files, first from zip to tgz, and then from tgz to uncompressed content. But this time, unlike in Windows, we recommend installing (extracting) the application first, and then extracting the data model files to an application subfolder named ‘featurepacks’ (this must be created, it doesn’t exist by default). It’s also necessary to create a subfolder ‘system’ in the application directory. Once this is done, you can move some components of your Windows TCC instance to the Linux one (of course, if you have no Windows machine available, you can create any of these components manually):
• | Copy file default.configuration_brd.xml from windows TCC/system to the Linux TCC/system |
• | Copy extract XML and configuration XML files, from wherever you had them created, to the main Linux TCC directory |
There are also some changes that need to be made in the TaleoConnectClient.sh file
• | Set JAVA_HOME variable there, at the top of the file (just below #!/bin/bash line), setting it to the path of your Java SDK installation (for some reason, in our installation, system variable JAVA_HOME wasn’t captured correctly by the script) |
• | In the line below #Execute the client, after the TCC_PARAMETERS variable, add: ✓ parameters of proxy server if it is to be used: |
-Dhttp.proxyHost=ipNumber –Dhttp.proxyPort=portNumber |
✓ path of Data Model: |
-Dcom.taleo.integration.client.productpacks.dir=/u01/oracle/tcc-15A.2.0.20/featurepacks |
So, in the end, the TaleoConnectClient.sh file in our environment has the following content (IP addresses where ‘masked’):
#!/bin/sh JAVA_HOME=/u01/middleware/Oracle_BI1/jdk # Make sure that the JAVA_HOME variable is defined if [ ! "${JAVA_HOME}" ] then echo +-----------------------------------------+ echo "+ The JAVA_HOME variable is not defined. +" echo +-----------------------------------------+ exit 1 fi # Make sure the IC_HOME variable is defined if [ ! "${IC_HOME}" ] then IC_HOME=. fi # Check if the IC_HOME points to a valid taleo Connect Client folder if [ -e "${IC_HOME}/lib/taleo-integrationclient.jar" ] then # Define the class path for the client execution IC_CLASSPATH="${IC_HOME}/lib/taleo-integrationclient.jar":"${IC_HOME}/log" # Execute the client ${JAVA_HOME}/bin/java ${JAVA_OPTS} -Xmx256m ${TCC_PARAMETERS} -Dhttp.proxyHost=10.10.10.10 -Dhttp.proxyPort=8080 -Dco m.taleo.integration.client.productpacks.dir=/u01/tcc_linux/tcc-15A.2.0.20/featurepacks -Dcom.taleo.integration.client.i nstall.dir="${IC_HOME}" -Djava.endorsed.dirs="${IC_HOME}/lib/endorsed" -Djavax.xml.parsers.SAXParserFactory=org.apache.xe rces.jaxp.SAXParserFactoryImpl -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl -Dorg.apache. commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger -Djavax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/x path/dom=net.sf.saxon.xpath.XPathFactoryImpl -classpath ${IC_CLASSPATH} com.taleo.integration.client.Client ${@} else echo +----------------------------------------------------------------------------------------------- echo "+ The IC_HOME variable is defined as (${IC_HOME}) but does not contain the Taleo Connect Client" echo "+ The library ${IC_HOME}/lib/taleo-integrationclient.jar cannot be found. " echo +----------------------------------------------------------------------------------------------- exit 2 fi |
Once this is ready, we can also apply the necessary changes to the extract and configuration files, although there is no need to change anything in the extract definition (file blog_article_sq.xml). Let’s have a quick look at content of this file:
<?xml version="1.0" encoding="UTF-8"?> <quer:query productCode="RC1501" model="http://www.taleo.com/ws/tee800/2009/01" projectedClass="JobInformation" locale="en" mode="CSV-ENTITY" largegraph="true" preventDuplicates="false" xmlns:quer="http://www.taleo.com/ws/integration/query"><quer:subQueries/><quer:projections><quer:projection><quer:field path="BillRateMedian"/></quer:projection><quer:projection><quer:field path="JobGrade"/></quer:projection><quer:projection><quer:field path="NumberToHire"/></quer:projection><quer:projection><quer:field path="JobInformationGroup,Description"/></quer:projection></quer:projections><quer:projectionFilterings/><quer:filterings/><quer:sortings/><quer:sortingFilterings/><quer:groupings/><quer:joinings/></quer:query> |
Just by seeing the file we can figure out how to add more columns manually: we just need to add more quer tags, like
<quer:projection><quer:field path="DesiredFieldPath"/></quer:projection> |
With regard to the configuration file, we need to make some small changes: in tags cli:SpecificFile and cli:Folder absolute Windows paths are used. Once we move the files to Linux, we need to replace them with Linux filesystem paths, absolute or relative. Once the files are ready, the only remaining task is to run the extract, which is done by running:
./TaleoConnectClient.sh blog_article_cfg.xml |
See the execution log:
[KKanicki@BIApps tcc-15A.2.0.20]$ ./TaleoConnectClient.sh blog_article_cfg.xml 2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linux/tcc-15A.2.0.20/log/taleoconnectclient.log 2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linux/tcc-15A.2.0.20/log/taleoconnectclient.log 2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuration=blog_article_cfg.xml, request message=null, response message=null 2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuration=blog_article_cfg.xml, request message=null, response message=null 2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution 2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution 2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Export 2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Export 2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Export 2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Export 2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP 2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP 2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP 2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP 2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send 2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send 2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send 2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send 2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll 2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll 2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInformation-20170316T201829;Response Message Number=123952695;State=Completed;Record Count=1;Record Index=1; 2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInformation-20170316T201829;Response Message Number=123952695;State=Completed;Record Count=1;Record Index=1; 2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll 2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll 2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve 2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve 2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve 2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve 2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP 2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP 2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP 2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP 2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution 2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution 2017-03-16 20:18:36,393 [INFO] Client - The workflow execution succeeded. |
And that’s it! Assuming our files were correctly prepared, the extract will be ready in the folder declared in cli:Folder tag of the configuration file. As for scheduling, different approaches are available, the most basic being to use the Linux crontab as the scheduler, but you can also use any ETL tool that is used in your project easily. See the screenshot below for an ODI example:
Figure 7: TCC extracts placed into 1 ODI package |
The file extract_candidate.sh contains simple call of TCC extraction:
[KKanicki@BIApps tcc-15A.2.0.20]$ cat extract_candidate.sh #!/bin/bash cd /u01/tcc_linux/tcc-15A.2.0.20/ ./TaleoConnectClient.sh extracts_definitions/candidate_cfg.xml |
If your extracts fail or you have any other issues with configuring Taleo Connect Client, feel free to ask us in the comments section below! In the last couple of years we have delivered several highly successful BI Projects in the Human Resources / Human Capital space! Don´t hesitate to contact us if you would like to receive specific information about these solutions!