RNRajeev Nayar: Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data. Big data just deals with data sets of extremely high volumes that cannot be handled by traditional tools. While a lot of companies look at Big Data as an exclusive area, our aim is to help our customers bring together all forms of data from within and outside the enterprise to analyze them. The tools used for analysis vary based on the data sets that are coming in. Infosys is looking to provide a common platform to process all types of data with big data augmenting the existing BI capabilities.
AIM: What are the main tools/ technologies that you employ at Big Data?
RN: While a lot of the current big data engagements in the industry are centered on Hadoop, we are looking at a three-pronged approach. This involves –
- Large unstructured data managed by technologies like Hadoop
- Large structured data managed by technologies such as Teradata and Netezza
- Real-time data managed by the likes of SAP HANA, NoSQL, MongoDB, Cassandra
The ultimate aim is to provide business and IT user with a single platform that enables them to work with a variety of technologies that best suits their data analytics requirements.
AIM: How much dependence do you have on the overall IT department of your organization?
RN: The Big Data practice here at Infosys has its own setup to help develop and test various features, and industry applications among other things We work closely with our cloud team when it comes to hosting of our applications. Most of our solutions are developed in collaboration with teams who bring in their domain expertise in technology and data modelling. Vertical-specific analysts and data scientists help us bring our big data applications to life.
AIM: How is Big Data being fed to the analytics teams?
RN: We strongly believe that there will ultimately be an amalgamation of Big Data with traditional BI/BA practices. What we are looking to achieve is to provide a common layer which brings together all data formats and help generate insights. Users should not have to worry about the tools or the platform, but focus instead on the insights generated.
AIM: What are the next steps/ road ahead for Big Data at your organization?
RN: We are noticing three major trends in the market changing the way we look at big data. These trends are driving a transformation in the way we look at the information management space.
- Scope of data – There is now a focus on ‘all data’. Enterprises have moved from beyond structured and trusted data to encompass all forms of data from various sources to help them determine correlations and drive insights
- Structuring of correlations – With traditional technology, it was never easy to determine the relationship between otherwise disjointed data sets. With the amount of data at our disposal, companies are now realizing the value of determining these correlations
Enabling self-service – The technologies in the Big Data space are changing almost regularly. It is not easy for enterprises to keep themselves abreast of all these changes. There is a need for a single solution built as a seamless layer that can integrate all the new changes in technology. For business users, the focus should be on deriving value from insights themselves rather than being tied down by technology
AIM: What are a few things that organizations should be doing with their Big Data efforts that most don’t do today?
RN: Almost all organizations today are focused on using big data technologies rather than addressing the problems this technology is trying to solve. At the end of the day, it is important to understand that these are just tools. Organizations need to figure out the value of the insights being delivered and then use the tools accordingly. To give a crude example, a screw driver by itself has no value. It is only when it is used properly that the screwdriver is of value to anyone. It is paramount that the business problem and business value are put in front of the technology being used.
AIM: What are the most significant challenges you face being in the forefront of Big Data space?
RN: When it comes to implementing Big Data, what enterprises are truly looking for is the ability to quickly discover, analyze and act on information to drive business decisions. The few solutions in the market have inherent limitations. Solutions either demand deep technical expertise and long implementation times or are designed for specific business scenarios and need constant investment in newer applications.
There is no single solution in the market that addresses the big data needs of business & technology decision makers alike. Technology teams need the flexibility for rapid development of industry-specific big data applications while business needs the agility for insights and actions. This is where a platform like Infosys BigDataEdge is ideal.
Another big challenge that lies in keeping up with the vast Big Data landscape with a growing number of players. We are tackling the same through strategic technology partners like Cloudera and Mongo DB.
The largest concern for us lies in the acute shortage of skills in this space. Enterprise solutions are still not mature enough to be intuitive. Obtaining and retaining resources for big data projects is a big challenge. While solutions like Infosys BigDataEdge looks to address this skill gap, the current scenario is a challenge for companies.
AIM: How did you start your career in Big Data?
RN: I’ve been working in the ‘data’ space for quite some time. I’ve seen how enterprises have evolved with regard to processing and gaining insights. We began using Hadoop four years before it became popular. We helped a customer analyze their P1 logs from their tech support team. Being an early adopter, our approach to the space was very different where we focused on how a single technology solves specific big data problems. However, with time we have found that there is no single solution that meets all the requirements. A set of solutions have to be brought together and engineered in a manner so as to solve business problems.
AIM: What do you suggest to new graduates aspiring to get into Big Data space?
RN: Graduates should focus on their problem-solving skills. What the big data space needs today are not people with strong coding abilities, but people with a strong analytics mindset. Aspirants need to understand that technology is just the enabler. Without the ability to understand a problem and hunt for answers, the knowledge of the technology will not add much value.
AIM: What kind of knowledge worker do you recruit and what is the selection methodology? What skill sets do you look at while recruiting in Big Data?
RN: The very first skill we look at is the candidate’s analytical mindset. While we can spend time training them on the technology, without the analytical frame of mind, the technology will be of no use. We can’t teach people how to think. We do look for candidates with a strong Java background which serves as the starting point to help them mature in the big data space. Knowledge in areas such as Hadoop, Mongo DB is always a benefit.
AIM: How do you see Big Data evolving today in the industry as a whole? What are the most important contemporary trends that you see emerging in the Big Data space across the globe?
RN: With petabytes of data being generated regularly, there is clearly tremendous value that lies in this data. It just needs to be analyzed. A clear trend emerging is the enterprise’s approach to decision making which is becoming more statistical in nature. With the variety of data coming into the enterprise, data driven decision making will be the new normal.
Cloud applications (like Salesforce.com) are widely prevalent in today’s enterprises. These applications running on the cloud utilize tons of data. The old way of doing things was to bring all this data to a central location and then break it down to analyze the same. We see a lot of focus on how enterprises are going to analyze these vast dispersed sources of information without physically bringing them together. Infosys’ BigDataEdge is a platform which is helping organizations achieve the same through solutions such as the ‘augmented data warehouse’.