Why Big Data Transformations Fail header main page

Why Big Data Transformations Fail – And What You Can Do To Succeed

In the New Vantage Big Data and AI survey for 2021, it was reported that only 29.2% of Fortune 1000 companies are achieving “transformation business outcomes”, and only 30% of firms have developed a well-articulated data strategy for their company. Let’s talk about why this is the case.

 

Thirty years ago, the topic of data was not even a blip on the radar. About 20 years ago, data started to be some kind of “thing” – with some organisations even starting to make decent use of it in that decade. Now, it’s a simple, obvious fact and not a point of discussion or contention anymore – just like you’d say “There is air”, now you say that “There is data, and it’s important to a business” and everyone accepts it without question. So, the undeniable fact is that data is here to stay, and that’s a good thing. But we do have some distance to cover, in terms of how companies are embracing this data and leveraging it.

 

Who Drives Big Data Transformations?

 

The most common pitfall or reason for the failure of a transformation is also the most crucial. We see that most of these transformation initiatives are driven by IT and not by business.

 

A couple of decades ago, business intelligence was trendy (therefore the first decent uses of data). Ten years ago, Big Data was trendy, and for the past five years or so, Machine Learning has been trendy. Many companies fall into the trap of “Let’s just buy a platform to do that trendy thing.” And that’s not a good approach.

 

There are a lot of misconceptions at play here. Storing data in a data lake is cheap. And when hardware is cheap, it can be tempting to just store data there and then think about what to do with it. The process of “Let’s think about what to do with the data” is the expensive part. This expense is not only for having the team to do things with the data, but also for the process of bringing the data to a usable state.

 

The best approach is to first understand what we are trying to achieve, what are the pain points of the company, and how data can help with that?

 

Is Your Data Usable?

 

This brings us to a second pitfall, which is that we can have data and computing power now because we bought a very nice cluster, but the data that lands there is usually not usable right off the bat. It’s mostly garbage. So it’s “Garbage in, garbage out” – unless you massage the data and transform it into something that is really usable.

 

Let’s talk a bit more about this common misconception of being able to use the data right away. I think industry and people are learning more about this. I’ve been in many conferences, events, and webinars, and I’ve talked about this topic quite a lot.

 

One example I like to use is that in machine learning projects, on average, 70% of the time is spent on just preparing the data for actually building the model. Here, we have to note that businesses have to pay for the data scientist and the data. And they may not think that it is something required if they just want their prediction. But to get that prediction, there is a long process that can take months to get everything ready.

 

What Are Your Objectives?

 

For a Big Data or analytics transformation to work, you really need to know the end goal of that data. So it comes back to the beginning – before you go and invest in a Big Data platform, you need it to be sponsored by business. And there has to be a clear vision of what you want to achieve with this.

 

Again, this needs to be business-driven, to get to the objectives of what you want to get from your data. These objectives have to be clear. One way to get clarity on objectives for your data is to identify some use cases.

 

You can start by identifying maybe one or two driver use cases, and go from there. A common driver use case is that your data platform is a traditional BI setup with an ETL and data warehouse.  Using a one-node data warehouse is too slow, and you need more horsepower. Then, once you’ve identified this driver use case, you can size it accordingly and build a platform from that.

 

Ideally, you would also want to understand what the new technologies are bringing in terms of possibilities. There could be a problem that you could not even think of as something to work on because there was no way to achieve it anyway, and you didn’t want to lose time doing that. With new technology, problems which we had no solution to before can now be solved.

 

Now, all of this is a process, and it takes time. You don’t go from zero to a hundred in no time, and that’s fine. But for example, if you look a bit into artificial intelligence conferences and typical use cases, one very common example is predictive maintenance.

 

Do You Have The Infrastructure?

 

Consider that you want to go for a use case like predictive maintenance on machinery in your factories.

 

Firstly, why do you want to implement this use case? Do you have the sensors to collect this data? If the answer is yes, then are the sensors connected to something that can gather the data? The answer to the last question, in our experience, tends to be “no” more often than “yes”.

 

It’s not only necessary that you have the data platform, but you also need the means to generate that data and to collect it. That requires infrastructure.

 

People often fall into the “Let’s buy the platform, and then let’s see” trap. And this “let’s see” may take five years before you’re ready to even feed the data. Then you are stuck in this situation where you have a platform that’s failing because you cannot really use it.

 

Change The Mindset

 

It’s not just the infrastructure, though. You have to change the mindset of the organisation; and that requires an alignment between business, technology, and the organisational processes in the company. If you have the data platform but no organisational processes to feed the data, or if you have the data platform but have no idea of what insights are required, it’s all going to fold.

 

In the same New Vantage Big Data and AI survey that we referred to earlier, it was also reported that 92.2% of the surveyed companies say that the greatest challenge in their efforts to becoming a data-driven organisation is due to cultural barriers. Companies seem to be struggling with this disconnect between business and IT, and the misconception that data is for IT alone to think about, their “thing”.

 

That has to change, and again, it’s a long process. Data is the “thing” of everyone. This is a concept that organisations need to establish more firmly. The data engineers and data scientists are the ones who know how to do the “real touch” of the data. But the business needs to know that the data is almost as important as the product or service being sold by the company.

 

So, if you’re a data scientist or data engineer looking to convince business users that they should be active stakeholders in big data transformation, my biggest tip for you is education. Talk to business users and organise sessions. Give them examples of successful implementation of use cases and show how you could implement similar things in your organisation.

 

At the same time, it’s important to set realistic expectations in these conversations. State the benefits clearly, but also clearly outline the path to reach these benefits. Mention the approximate amount of time and effort it will take.

 

Implementation Strategy Is Key

 

Implementation strategy is also one of the common places where we see organisations falter. Let me give you a quick example.

 

I’ve been in discussions where, let’s say, the company wants to put in a 20 node cluster with 20 different machines running in parallel, and a huge data lake. Which isn’t wrong, per se. The real question here is – Do you really need that right now? Let’s be realistic. Even if you get 20 machines now, you’re not going to have the implementation to use them anytime soon, like in one or two months. Maybe you’re going to have one or two use cases in the next year.

 

The best way to implement is to start small, and then grow as your use cases expand. The great thing about the new developments in Big Data and Cloud technologies is that they allow you to grow your implementation in a way that is flexible and transparent for the application. So you can just add more horsepower as you grow.

 

Don’t Forget About Governance

 

Now let’s consider that we’ve successfully navigated through getting the drive by business, deploying the platform, and having the right implementation strategy. We’re still not completely out of the woods – we need to consider the governance of how the platform and its data is being used.

 

For example, if we’re talking about advanced analytics, you don’t want to end up with a ton of Jupyter Notebooks spread out on the laptops of the scientists, connecting to the cluster and running things, with no central governance of all these processes. In such a case, if one of these scientists leaves, it would be very difficult to understand what they were doing, and how you can move forward smoothly. Some of these things can be pretty complicated in terms of the implementation and the underlying mathematics. You might also have a brilliant solution that is just hidden in someone’s laptop.

 

So when you create solutions as an organisation, you need to have the governance of what you have created. And that means not only the governance of the data, but also of the code. You need to have a central control over what is happening in your organisation in terms of these developments.

 

Conclusion

 

While it’s true that many big data transformations fail, it’s not all doom and gloom. We’re in a much better position than we were in even 10 years ago when the Big Data movement really took off. Today, we have more data and more horsepower than ever.

 

We’ve established a foundation now, that being able to leverage data is an integral component of an organisation’s success. What we can do with data keeps evolving, and buzzwords like BI, Big Data, Artificial Intelligence, or Data Science – they come and go.

 

Now, it’s the era of data science and artificial intelligence. Five years from now, it will be a different thing. But I’m sure it will be around data, and that’s not going to stop. This, for me, is the big success. Data is here, it’s here to stay, and we’re not doing too bad!

 

At ClearPeaks, we’ve been helping customers on their data journeys for quite some years now, and we’ve learned a thing or two. We’re happy to jump on a call with people who want help, and we have a few conversations like this every week. We’re happy to share our knowledge and provide information that’s based on our in-depth experience and expertise. So if you’d like to have a conversation with us, don’t hesitate to reach out!

 

Big Data and Cloud Services blog banner

Oscar M
oscar.martinez@clearpeaks.com