Big data is the big buzzword these days. Big data refers to a collection of data sets or information too large and complex to be processed by standard tools. It is the art and science of combining enterprise data, social data and machine data to derive new insights, which are otherwise, not possible. It is also about combining past data with real time data to predict or suggest the outcomes for the current or future context.
The digital footprint, is progressively expanding, world over, into fragmented mediums (blogs, tweets, reviews etc.) and technologies (mobile, web, cloud/SaaS etc.).
Sign up for your weekly dose of what's up in emerging technology.
Digital landscape in India
India’s digital landscape too, maybe evolving quickly but overall penetration remains low, with only 1 in 5 Indians using the Internet in July 2014.
In India enterprises and businesses have access to a veritable wealth of information. And though some of the larger organisations have made a start in harnessing the information, most Indian companies are still learning how to collect and store big data.
Telecom providers, online travel agencies, online retail stores are some of the industries that are using big data analytics to engage customers is some ways.
However, big data analytics is still its infancy in India. Most companies are learning to store the data collected. Then, there are several challenges when it comes to the collection of data sets themselves. Past and current data is required to make the application of big data analytics really useful, there is a scarcity of past data in public and private sectors in India. Some of the reasons for the lack of enough data are:
Yet to be fully computerised
Healthcare, economic and statistical data, in both private and public sectors in India is yet to be computerised. The main reason for this is the late adoption of IT in India. Unlike in the West, most industries in India made the transition from manual records to computerised information systems, only during the last decade.
Over the years, the state and central ministries have made the move towards e-governance. Efforts to deliver public services and to make access to these services easier are being made as well. While this is still a work in progress, huge amounts of data across many government sectors are yet to be digitised.
Quality of data
In big data analytics, data sufficiency plays a critical role when samples are run across different dimensions. Sufficient data points to perform analytics with the samples are required. Not only quantity of data, the quality of data being used for crunching, also influences the quality of insights. If the signal-to-noise-ratio is high, the accuracy of results may vary for less than optimum data samples. In a country like India, there is very little information about the individuals, due to the fact that Indians are not overly expressive, especially on public forums.
Public social media information that is available for most individuals from India lacks quality information about users themselves. Random facts and figures in individual profiles, sharing of spam content, and fake social media accounts that are created for bots are very common in India.
Social media sites are becoming increasingly vulnerable to spam attacks. Time spent by a captive audience on social media sites opens up windows of opportunities, for online threats and spammers.
Again, social media spam contributes to the signal-to-noise-ratio that defines the quality of big data. This hinders the appropriateness of results.
Cultural and Social influences
In most western markets, insights generated through big data can be applied across the whole consumer base. However, given the extensive cultural and linguistic variation across India, any insight generated for a consumer based out of Chandigarh, for example, will not be directly applicable to a consumer based in Chennai. This problem is made worse, by the fact that a lot of local data lives in regional publications, in different languages and has very limited online visibility.
Unstructured data leads to mapping issues
Big data in India is not structured. Most transactional data in the healthcare and retail segments are stored purely for book keeping purposes. They have very limited appropriate information that can help big data analytics map enterprise generated transactional data, with public information.
In the case of developed countries, user data is rich enough to provide demographic or group level markers that can be used to generate customized insights while maintaining individual privacy. Lack of these standard identifiers in Indian consumer data is one of the biggest bottle necks, while mapping various transactional and social records in India.
Handsets and internet connectivity
Even though smart phones are driving the new handset market in India, feature phones still dominate everyday usage. Most connections in India are pre-paid and fewer than 10% of users have access to 3G networks. To add to it, internet connection speeds are amongst the lowest in Asia. As a result, consumer data, especially retail enterprise data is limited.
As more people in India make the move to smart phones, and internet connectivity improves, there will be an increase in the amount of usable data generated. As Big data analytics may be at its infancy in India today ,huge efforts would need to be made to improve the quality of data by organisations and enterprises. However, key contributors to the promise of big data analytics in India are steadily gaining ground. An increase in social media users, efforts by enterprises, both public and private for optimum collection and storage of transactional enterprise data, will contribute to better quality data sets for the better application of big data analytics.