Advertisement

Active Hackathon

Four hard topics in Analytics explained in plain English

Machine Learning in plain English

If someone asks you, “What is ML?”, what will be your conceptual, non-technical answer?

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Mine is . . . ML is “cluster”, “classify” and “combine”.

I use these words in their English language sense and not as techniques. What do I mean by that?

Cluster: Structure in the data is information – find the structure.

Classify: Transform structure into a Mathematical form.

Combine: Convert into insight/ action.

Do this by Learning – meaning, use the ability to generalize from experience.

This captures the essence of ML for me. From my experience, I find that –

  • Combine: best done by a “paired” Data Scientist – Domain Expert combo.
  • Classify: there is a grab bag of tools and techniques that the Data Scientist can exploit on one’s own. You can see my attempt at unifying this bag of tricks here – “Unifying Machine Learning to create breakthrough perspectives”. http://pgmadblog.blogspot.com/2015/10/unifying-machine-learning.html
  • Cluster: I am not referring to specific clustering *algorithms* here. This step is where the Data Scientist works to sense, identify and extract structure or patterns or features in the data which are the bearers of information!

“Cluster” is the hardest part – data do not tell you where it hides the structure. Finding patterns is an “art” where inspiration, skill, experience, knowledge of inter-related theories, etc. play a major part. In a current algorithm work that I am doing, it turned out (after months of slicing and dicing the data) that rendering data into “phasors” (or complex variables) revealed the structure hidden in the data “by itself”!

If you are able to get at the most descriptive and discriminatory features at the “Cluster” stage, the rest of the steps will just fall into place (almost) and provide the best robust solution! If not, you may succeed but you will work many times harder to Classify and Combine and end up with non-optimal answers.

It must be clear that my comments apply only to the first time development of an algorithm for a new business problem; once an end-to-end algorithm is in place, of course, the Cluster-Classify-Combine steps can be automated for repeated application to similar data sets. But for the first-time ML algorithm solution development, automation cannot replace art!

Why is Predictive Analytics important to business?

A prerequisite for performance at a high level in business is the ability to understand and manage complexity. Complex systems to be managed properly requires a ton of data at the right time. BIG Data provide us the data we need; to put these data to work in order to take us to the high levels of complexity required while still managing it, we have to anticipate what is about to happen and react when it happens in a closed loop manner. Predictive Analytics will allow us to push our “system” to the edge (without “falling over”) in a managed fashion. This is why businesses embrace Predictive Analytics – to manage businesses at a high level of performance at the edge of complexity overload.

Prediction – the other dismal science?

An insightful person once said, “Prediction is like driving your car forward by looking only at the rearview mirror!”. If the road is dead-straight, you are good . . . UNLESS there is a stalled vehicle ahead in the middle of the road.

We should consider short-term and long-term prediction separately. Long-term prediction is nearly a lost cause. In the 80’s and 90’s, chaos and complexity theorists showed us that things can spin out of control even when we have perfect past and present information (predicting weather beyond 3 weeks is a major challenge, if not impossible). Even earlier, stochastic process theory told us that “non-stationarity” where statistics evolve (slowly or fast) can render longer term predictions unreliable.

If the underlying systems do not evolve quickly or suddenly, there is some hope. Causal systems (in Systems Theory, it means that no future information of any kind is available in the current state of the system), where “the car is driven forward strictly by using the rearview mirror”, outcomes are predictable in the sense that, as long as the “road is straight” or “curves only gently”, we can be somewhat confident in predicting a few steps ahead. This may be quite useful in some Data Science applications (such as in Fintech).

Another type of prediction involves not the actual path of future events (or the “state space trajectories” in the parlance) but the occurrence of a “black swan” or an “X-event” (for an elegant in-depth discussion, see John Casti, “X-Events: Complexity Overload and the Collapse of Everything’, 2013). For that matter, ANY unwanted event can be good to know about in advance – consider unwanted destructive vibrations (called “chatter”) in machine tools, as an example; early warning may be possible and very useful in saving expensive work pieces (“Instantaneous Scale of Fluctuation Using Kalman-TFD and Applications in Machine Tool Monitoring”). We find that sometimes the underlying system does undergo some pre-event changes (such as approach “complexity overload”, “state-space volume inflation”, “increase in degrees of freedom”, etc.) which may be detectable and trackable. However, there is NO escaping False Positives (and associated wastage of resources preparing for the event that does not come) or False Negatives (and be blind-sided when we are told it is not going to happen).

At Syzen Analytics, Inc., we use an explicit systems theory approach to Analytics. In our SYSTEMS Analytics formulation (“Future of Analytics – a definitive Roadmap”), the parameters of the system and its variation over time are tracked adaptively in real-time which tells us how long into the future we can predict safely – if the parameters evolve slowly or cyclically, we have higher confidence in our predictive analytics solutions.

Wanting to know the future has always been a human preoccupation – we see that you cannot truly know the future but in some cases, predictions to some extent are possible . . . surrounded by many caveats; more of “excuses” than definitive answers. Sounds a lot like a dismal science!

Future of Analytics – Spatio-temporal data:

As businesses push to higher levels of performance, higher fidelity models are going to be necessary to produce more accurate and hence valuable predictions and recommendations for business operations.

ALL data are spatio-temporal! At the simplest to more complex levels –

  • Data can be considered isolated at the simplest level – a “snap shot”.
  • Then we realize that data exist in a “social” network with mutual interactions.
  • In reality, data exist in *embedded* forms in “influence” networks of one type or the other which are distributed in time and space – a “video”!

Spatial extent of data (distance) can be folded into time if we assume a certain information diffusion speed. Graph-theoretic methods do not account for time dimension. For accurate analysis, no escaping Dynamics over Time; meaning the use of differential (or difference) equations . . . and Systems Theory!

pg

Systems Theory + Analytics = “SYSTEMS Analytics”! A few example business applications are shown above. As you can see, it spans most of the current Analytics use cases and many more promising ones when network graphs and spatio-temporal nature of data are fully incorporated in the coming years – basic theories and some algorithms are already in hand. For specific technologies, see –

From the simple explanation of ML, the power and caveats about prediction and the promising Analytics technology roadmap ahead, it is clear that Data Science is indeed a rich area to mine that can create even bigger impact on business performance in the coming years.

More Great AIM Stories

PG Madhavan
Dr. PG Madhavan is the Founder of Syzen Analytics, Inc. He developed his expertise in Analytics as an EECS Professor, Computational Neuroscience researcher, Bell Labs MTS, Microsoft Architect and startup CEO. PG has been involved in four startups with two as Founder. PG has 12 issued US patents and over 100 publications & platform presentations to Sales, Marketing, Product, Industry Standards and Research groups.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.