A quick peek into the evolution of the data warehousing environment will reveal that a lot has changed since the 1990s. Earlier, data was collected based on usefulness and cost, since storage was expensive. Additionally, a lot of data was discarded to ensure optimized return on investment of storage. In the ‘90s, ETL (extract, transform, load) workflows meant funneling all data into relational databases, which in turn became the single source of truth for future operations. Engineers were responsible for getting the data into the databases, and eventually, to the analysts. Data was neatly manicured to fit the architecture.
However, that paradigm has flipped today. In time, as storage to compute and scale-out technologies became more and more affordable, data that didn’t seem to be of immediate importance, didn’t need to be discarded any longer. Today, as organizations gather more data from varied sources, any and all data of potential value is stored to be mined later. Big data technologies have also emerged, on the basis of “scale-out”, thus allowing data to be stored on commodity hardware, and to be processed efficiently in parallel. Currently, rather than fitting data to the architecture, the architecture fits itself to the data.
Nonetheless, despite the shift, and advancements made, in the way data is collected and consumed, only a few organizations have been successful in scaling their big data efforts.
Challenges with Big Data
Gartner predicts that 60 percent of big data projects over the next year will fail to go beyond the pilot stage and will be abandoned. This highlights two key weaknesses that most big data initiatives are typically plagued with, partly owing to how ‘big’ the data is:
- difficulty in identifying what data to collect, and
- inability to analyze the data that has been collected.
As we dive deeper into the problem, we can cite a number of reasons why big data is difficult. A lot of the struggle in handling big data has to do with the new systems and technologies that have emerged to address the need for big data. Since this innovation doesn’t seem to be slowing down, it has become exceedingly difficult for businesses (even those that embrace data) to have the vision and expertise to build and operate these platforms.
In addition, building a robust big data architecture requires piecing together a wide range of technologies, many being open source, in order to create coherent processes for serving up the data, optimally, to analysts, data scientists and data engineers. This is a large infrastructure investment, and is a common hindrance to the realization of big data initiatives.
Between the lack of expertise, large investments in infrastructure, and a constantly shifting technology landscape, many businesses get caught up in the confusion and begin to see projects flounder and fail.
Addressing the People Problem
Well, we have already established how the sheer rapidity with which the associated technology landscape is evolving, causes the lack of qualified personnel. Since newer technologies are still maturing in the big data space, to find expertise in those fields is even more difficult.
To address that problem, requires an organization-wide change – a transformation to a data-driven culture. A data-driven organization, in my opinion, should possess three things:
- A company-wide culture of using data to make business decision
- An organizational structure that supports a data-driven culture
- Technology that supports a data-driven culture, and makes data “self-service”
Of the above, I feel that creating a self-service culture, is the most important, and arguably the most difficult aspect of transitioning to a data-driven organization. This shift entails identifying and building a cultural framework that enables all the people involved in a data initiative – from the producers of the data, to those who build the models, to those who analyze it, to the employees who use it in their jobs – to collaborate on making data the heart of organizational decision-making.
To share a few “real-world” tips on building a data-driven culture, I will suggest:
- Hire data visionaries – you need people who are open minded about what the data will tell them regarding the way forward, and understand all the ways that employees can use data to improve the business.
- Organize your data into a single data store accessible to everyone – always allow employees to see the data that affect their work. This means eliminating data silos and effectively democratizing data access, while still preserving data security and compliance issues.
- Empower all employees – build a culture that allows all employees to share opinions, as long as they are backed up by data, even if those opinions contradict senior executives’ assumptions. This is key to keeping businesses competitive in even the fastest-moving markets.
- Invest in the right self-service data tools – your data, even if readily accessible, won’t help your business much if most of your employees can’t understand it, or don’t apply it to business problems. This can be solved by a) investing in the right data tools, and very importantly, b) training your employees on how to use those tools.
- Hold employees accountable – technology will only take you so far, and hence, you also need to put incentives in place to encourage the employees to use the technology and tools. Also, you must employ ways to measure and grade progress towards a self-service data culture. This means holding employees accountable for their action and progress when they effectively use data to drive business decisions.
Creating a data-driven culture is not always easy, but the benefits it provides are real and significant. Big data is truly transforming the ways that organizations conduct business, and hence, it should come as little surprise that it has a big role to play in changing your culture, as well.
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
Before co-founding Qubole, Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led Facebook’s Data Infrastructure team. Joydeep was also a key contributor on the Facebook Messages architecture team and brought the power of Apache Hbase to Facebook and to the transactional and reporting backends for Facebook Credits.