In a jargon loaded field of analytics, catchphrases like Machine Learning and Data Science have emerged as the most confusing tech trends. We give you a one-stop-shop to all the buzzing terms and phrases used in data and analytics space. This glossary will be your guide to all queries on big data. Navigate the tech jargon-filled arena with ease.


Analytics: Analytics is the discovery, interpretation, and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.

Artificial Intelligence: It’s the catchphrase that’s on fire and literally everywhere you go, AI is inescapable. It was confined to the realm of science fiction and is now making a roaring debut in the mainstream. First up, AI is intelligence exhibited by machines or machines behaving like humans. Where do we see AI in real life? Google’s self-driving car! Apple’s Sir! Now AI is even editing the film trailer of a thriller titled Morgan.


Business Intelligence: According to Wikipedia, the term BI dates back to  19th century and was coined by Richard Millar Devens and was used in the context of banking. Today, companies like Tableau and Qlik have built multi-million business on top of it. Simply put, BI is the art of visualizing data, or bringing it to life in the form of charts and interactive reports. With BI you can view your entire business on a dashboard.

Big data: It’s a term that dogs every CTO and CEO and is a part of everyday lexicon. Big Data is huge datasets (we are talking about petabytes) that are further analyzed computationally to find patters and meanings that can be used in business settings. It is also characterized by four important Vs: volume, variety, velocity and veracity.

Blockchain: An offshoot of bitcoin, blockchain is a distributed ledger, spread across the nodes of the network. The purpose of blockchain is to keep a tab of all the transactions that happen.


Computing power: Heard all the chatter about Google’s superior computing power or data centres that crunch mind-boggling amount of data (petabytes). Well, it all comes down to the computing power or how fast can a machine perform a task. Today it is all about speed and that’s where most companies are investing.

C: This general purpose programming language is how every techie kick-starts his/her programming journey. Here’s a bit of history trivia on C – it was created in the 70s by Bell Labs’s Dennis Ritchie. Of late, it’s been making news for outliving its usefulness. According to naysayers, C’s salad days are over with Google’s Go, Java and Python outshining it in the field of web and mobile app development.


Data Science: This interdisciplinary field combines statistics and computer science and it is the blanket term applied to all the techniques used by data scientist in dealing with structured and unstructured data.

Data Analytics: It’s a term which has gained so much traction over the years that its real meaning is lost the noise. Data analytics is not the same as data analysis or data mining. Analytics is all about applying a mechanical or algorithmic process to derive insights such as going over data sets to find patterns or correlations between them. There are several tools but some of the most popular data analytics tools are R, OpenRefine, RapidMiner and SAS among others.

Deep Learning: Deep learning algorithms are part of machine learning algorithms and are inspired how the human brain works. Deep learning algorithms learn and make sense of data without the need of an explicit algorithm. Search giant Google has made massive investment in deep learning technology and the result was on display when Google’s AplhaGo defeated the world’s best Go player.

Descriptive Analytics: Descriptive analytics is used for summarizing raw data and provide historical insights. Organizations use it to capture past data such as inventory stock, production and financial results.


ETL: Extract, Transform, Load is a database function for extracting raw data from one database and putting it another data warehouse. ETL is used for data integration where data flows in from disparate sources, it is here that it is converted into a compatible format and stored into a datamart or a data warehouse to be further analyzed.


Framework: In computer programming, a framework lays down the basic functionality, an extension of a working application. In other words, it is a set of libraries that you can build your applications around.


Gephi: This open source graph visualization tool helps data analysts to work with massive datasets and graphs in real time. the tool is used for exploring, analyzing, filtering, exporting all types of graphs.

Go: Google’s Go is quickly scaling the charts and is an offshoot of C. According to the Tiobe index for January 2017, Go was proclaimed the programming language of 2016, trumping Dart and Perl. This high level language just like Python is dubbed as the modern engineer’s C.


Hadoop: Hadoop is the big data technology that supports storage and processing of large datasets. This open-source software framework is used for storing large amount of data and running applications on clusters of commodity hardware. Why Hadoop is the go-to technology for big data is high processing power and the ability to store vast amount of data.

Hive: A Hadoop query language created by Facebook’s Data Infrastructure team is the popular choice for business analysts. The open source data warehousing solution that uses an SQL type language called HQL can support terabytes and petabytes of data as opposed to SQL. The downside is it only supports structured data.


Internet of things (IoT): No conversation is ever complete without IoT but what is IoT. It goes beyond data, devices and connectivity and now every IT bellwether has its own IoT platform that provides core services. Internet of Things has two aspects – consumer facing and one tailored for enterprises such as industrial where IoT is already in use.


Julia: A relative newcomer in the programming language space, Julia was created by Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah in 2009. Largely meant for scientific applications and software, it is destined to replace Matlab.


Kafka: Created by LinkedIn in 2012, Kafka is a distributed messaging system, a framework for stream processing.  The aim behind Kafka is to drive central messaging, communicating and integrating between data systems.


Machine Learning: Machine Learning forms the core of modern day Artificial Intelligence. It is a process of making machines learn through data or observations, means learning or predicting from patterns.

MapReduce: MapReduce is part of the Hadoop processing model and enables distributed processing on huge data sets. Hadoop the big data technology is used for a) processing and b) storing. Hadoop Distributed File System is for storing and MapReduce is for processing. As the name implies, data is divvied up in small parts and each part is processed simultaneously.


Natural language processing (NLP): NLP is also a branch of AI and wherein machine learning algorithms are used to detect patterns or commonly make sense of speech and text.


Open source: Simply defined, open source software denotes software or source code that can be used freely and modified by anyone. Today, open source has come to define a collaborative participation, collaboration and community oriented initiatives, leading to democratization of technology.  


Python: Undoubtedly one of the most popular programming languages, Python was created in 1980s as a scripting language. Today it has evolved to support applications for supercomputers and scripts for scalable web servers.

Predictive Analytics: As the name typified, predictive analytics is used to forecast future outcomes based on historical data.  With advanced analytics techniques such as data mining, modelling, machine learning among others, one can identify trends and patterns and predicts results.  

Prescriptive Analytics: Prescriptive analytics gives one or more possible outcomes to a problem and make use of different datasets – historical, real time data and big data to deliver one or two more possible outcome.


Query: It is a request for information from database and this is where Structured Query language (SQL) plays a huge role as a data mining tool.

Qlik: This Pennsylvania-headquartered company offers BI and data visualization tools that are used for digging deeper into data and visualizing key opportunities.


R: This open source scripting language has risen to the top for its dexterity in statistical analysis on large datasets. Used in big enterprises for advanced analytics, top vendors are now offering exclusive R-based packages wherein users can create data models thanks to a front end GUI.   


Scala: Scala stands for scalable language, this object oriented language is used by Twitter and other enterprises because it is “fast, fun and runs on a great virtual machine (JVM)”. And another usability is “its flexible syntax that helps in writing readable maintainable code”.

SAS: SAS is a market leader when it comes to statistical analysis tools – data management, BI, visual analytics SAS has cornered a 33% markets share in advanced analytics tools.

SPOTFIRE: This data analytics and visualization software has become the one-stop-shop for all analytics requirements and helps clients discover unseen insights in their data. The software can be mapped to specific business challenges and can integrated into corporate data infrastructure.

SPSS: Another popular analytics tool, SPSS is used to build predictive models and standard statistical graphs. It has an excellent visual interface that enables users to leverage statistical and data mining algorithms without much programming experience.


Text mining: Text mining or text analytics entails analyzing textual data to find patterns in natural language through statistical tools. In text mining, major algorithms are used to analyze words or a bunch of words in documents to detect patterns.

Tableau: This Seattle-headquartered company has quickly gained ground in the BI and visualization market through its excellent visualization tools and is billed as the best data visualization software.


Visualization: In data analytics, visualization is the art of bringing data to life. Some of the data visualization tools used are Tableau, High Charts, Google Charts and Fusion Charts among others.

More than 1,00,000 people are subscribed to our newsletter

Subscribe now to receive in-depth stories on AI & Machine Learning.