05 Mar 2019 New Plugin to refresh Data Flows and Data Sets
As you know, Oracle is one of the largest companies recently focusing on migrating to the cloud; as a matter of fact, the cloud is their priority in their bid to return to Gartner’s magic quadrant of BI leaders. One of their most used tools is Oracle Analytics Cloud, known as OAC.
OAC is a visualization tool that has the same functionalities as Oracle Data Visualization Desktop (Oracle DVD), but embedded in a cloud solution. As with many products, there are still detailed functionalities that could be enhanced – or don’t exist yet.
In May 2017 Oracle developed a brand new plugin to automatically refresh data used in DV projects for both OAC and DVD tools. However, it didn’t offer the possibility to refresh those data sets that were created by a data flow. At ClearPeaks we have enhanced the plugin to refresh the set of data flows with dependency on the project’s data sources. We are excited about sharing this enhancement and its benefits that will impact directly on the user experience.
1. The original plugin
Originally, Oracle created a plugin to manage data source updates, either sporadically or periodically, a useful solution to analyse changing or streaming data. The goal of this section is to present the initial plugin and its main functionalities. Remember that a plugin creates a new visualization and enables it directly to the data discovery panel of the tool. This is the original visualization:
Figure 1: Refresh plugin – original visualization.
In the main dashboard panel there are two different options to reload the data manually, Refresh Data and Refresh Data Sets.
Figure 2: Options to refresh data manually.
The first option does the same as Refresh Data in the plugin, while the second does the same as Refresh Data Sources. To differentiate one option from the other, imagine that there is only an XLS file feeding the project and we add extra rows; to make those extra rows available in the visualization panel, we only need to refresh data, the first option in the plugin. However, if we add an extra column to the XLS file, the metadata changes, so we need to apply the second option (refresh data sources). To fully manage the plugin, we only have to choose one of these two options and select if we want to refresh sporadically or periodically. This is a useful plugin to analyse real-time cases, or even if we just want to refresh the data source as we need to include new rows and see what the impact on the visualization panel is.
However, the plugin does not work in some common cases. For example, when a project is fed by a data set created by a data flow, we need to run the data flow first if we want to refresh the data source. With this plugin, that’s not going to happen!
As we needed to find a way to automatize the refreshing of the data flows behind the data source, we developed a new plugin to enhance Oracle’s.
2. Our customized solution
In this section we’ll present the solution that ClearPeaks developed, focused on updating data flows, detecting the data flows to be updated and automatically taking the necessary steps to refresh them by clicking on the refresh button.
Our main objective when we started developing the plugin was to execute data flows from a command line automatically – it was a business request. Remember that to run a data flow from OAC, the user must go to the data panel, choose Data Flows, and by right-clicking on the desired data flow, run it.
Figure 3: OAC – How to run a data flow.
This process is time-consuming and even a little irritating – that’s why we started working on this plugin. Furthermore, there are other aspects to consider when refreshing: what if our data source is generated by multiple data flows so it needs previous data flows to be run in order to provide consistent data? Yes, that’s a tricky one …
Let’s explain what types of data sources exist and what actions we need to implement if we want to refresh them:
- Data Set. When we want to refresh a Data Set, it must be done manually because the tool forces us to provide another file. This is the only case that the plugin can’t cover as it requires a manual action.
- Connection. Once the connection is created, we don’t have to refresh it, because each time we want to refresh the data set provided by the connection, the tool will throw a query to capture the newest information though the connection.
- Data Flow. The process to refresh a data flow is explained above in this article and is quite laborious.
- Sequence. A sequence is a process that concatenates multiple data flows and executes them sequentially. The execution process is the same as the data flow.
As we said before, our plugin is only focused on updating data flows, so our first step to make the refresh automatic was to look at the OAC option to schedule data flows. As you can see in Figure 3, by right-clicking on a desired data flow there’s the option to schedule it. But schedule is usually related to a very regular data load, and that’s not our case. Because what happens when on Monday the data is loaded at 2 am and on Tuesday at 3 pm? What if it changes every week? This could be a nightmare!
That’s why we decided to start looking at how OAC runs a data flow through JavaScript commands, to try to replicate the logic and to create an error handler to know when a run fails and why.
As it also makes sense not to run the plugin only to analyse changing or streaming data, we have modified the visualization to refresh only when the users want. We have just left one button to refresh it.
In addition, it doesn’t matter if the data sources have been changed by adding a new column or just by updating the data, the plugin takes this into consideration and evaluates if a Refresh Data or Refresh Data Sources is required.
Our plugin visualization looks like this:
Figure 4: Our ClearPeaks plugin visualization.
What’s more, our custom plugin can be used in both OAC and Oracle DVD!
Conclusion
At ClearPeaks we have been able to develop a plugin solution to refresh all data sources by executing the data flows that depend on them and also allowing users to ensure that the analysed data is as up-to-date as possible.
Unfortunately, as nothing in life is ever truly perfect, the plugin still has some features that need covering, but remember that this is only our first enhancement: in the coming versions we will go through the whole set of data source elements, like sequences and the dependencies between each element.
Stay tuned to see what’s new in the following versions of the plugin. We are looking forward to seeing what’s next in the Oracle portfolio and we will update you as soon as we can. We’d be really interested in helping you with any issue related to this article, so feel free to contact us whenever you want.