Addressing the People Problem in your Big Data Architecture

A quick peek into the evolution of the data warehousing environment will reveal that a lot has changed since the 1990s. Earlier, data was collected based on usefulness and cost, since storage was expensive. Additionally, a lot of data was discarded to ensure optimized return on investment of storage. In the ‘90s, ETL (extract, transform, load) workflows meant funneling all data into relational databases, which in turn became the single source of truth for future operations. Engineers were responsible for getting the data into the databases, and eventually, to the analysts. Data was neatly manicured to fit the architecture.

However, that paradigm has flipped today. In time, as storage to compute and scale-out technologies became more and more affordable, data that didn’t seem to be of immediate importance, didn’t need to be discarded any longer. Today, as organizations gather more data from varied sources, any and all data of potential value is stored to be mined later. Big data technologies have also emerged, on the basis of “scale-out”, thus allowing data to be stored on commodity hardware, and to be processed efficiently in parallel. Currently, rather than fitting data to the architecture, the architecture fits itself to the data.

Nonetheless, despite the shift, and advancements made, in the way data is collected and consumed, only a few organizations have been successful in scaling their big data efforts.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Challenges with Big Data

Gartner predicts that 60 percent of big data projects over the next year will fail to go beyond the pilot stage and will be abandoned. This highlights two key weaknesses that most big data initiatives are typically plagued with, partly owing to how ‘big’ the data is:

  1. difficulty in identifying what data to collect, and
  2. inability to analyze the data that has been collected.

As we dive deeper into the problem, we can cite a number of reasons why big data is difficult. A lot of the struggle in handling big data has to do with the new systems and technologies that have emerged to address the need for big data. Since this innovation doesn’t seem to be slowing down, it has become exceedingly difficult for businesses (even those that embrace data) to have the vision and expertise to build and operate these platforms.

Download our Mobile App

In addition, building a robust big data architecture requires piecing together a wide range of technologies, many being open source, in order to create coherent processes for serving up the data, optimally, to analysts, data scientists and data engineers. This is a large infrastructure investment, and is a common hindrance to the realization of big data initiatives.

Between the lack of expertise, large investments in infrastructure, and a constantly shifting technology landscape, many businesses get caught up in the confusion and begin to see projects flounder and fail. 

Addressing the People Problem

Well, we have already established how the sheer rapidity with which the associated technology landscape is evolving, causes the lack of qualified personnel. Since newer technologies are still maturing in the big data space, to find expertise in those fields is even more difficult.

To address that problem, requires an organization-wide change – a transformation to a data-driven culture. A data-driven organization, in my opinion, should possess three things:

  • A company-wide culture of using data to make business decision
  • An organizational structure that supports a data-driven culture
  • Technology that supports a data-driven culture, and makes data “self-service”

Of the above, I feel that creating a self-service culture, is the most important, and arguably the most difficult aspect of transitioning to a data-driven organization. This shift entails identifying and building a cultural framework that enables all the people involved in a data initiative – from the producers of the data, to those who build the models, to those who analyze it, to the employees who use it in their jobs – to collaborate on making data the heart of organizational decision-making.

To share a few “real-world” tips on building a data-driven culture, I will suggest:

  • Hire data visionaries – you need people who are open minded about what the data will tell them regarding the way forward, and understand all the ways that employees can use data to improve the business.
  • Organize your data into a single data store accessible to everyone – always allow employees to see the data that affect their work. This means eliminating data silos and effectively democratizing data access, while still preserving data security and compliance issues.
  • Empower all employees – build a culture that allows all employees to share opinions, as long as they are backed up by data, even if those opinions contradict senior executives’ assumptions. This is key to keeping businesses competitive in even the fastest-moving markets.
  • Invest in the right self-service data tools – your data, even if readily accessible, won’t help your business much if most of your employees can’t understand it, or don’t apply it to business problems. This can be solved by a) investing in the right data tools, and very importantly, b) training your employees on how to use those tools.
  • Hold employees accountable – technology will only take you so far, and hence, you also need to put incentives in place to encourage the employees to use the technology and tools. Also, you must employ ways to measure and grade progress towards a self-service data culture. This means holding employees accountable for their action and progress when they effectively use data to drive business decisions.

Creating a data-driven culture is not always easy, but the benefits it provides are real and significant. Big data is truly transforming the ways that organizations conduct business, and hence, it should come as little surprise that it has a big role to play in changing your culture, as well.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Joydeep Sen Sarma
Before co-founding Qubole, Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led Facebook’s Data Infrastructure team. Joydeep was also a key contributor on the Facebook Messages architecture team and brought the power of Apache Hbase to Facebook and to the transactional and reporting backends for Facebook Credits.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.