It is really getting imperative to understand whether Machine Learning (ML) algorithms improve the probability of an event or predictability of an outcome. While the former is just a chance that an event x will occur out of the n times in the experiment, the latter is the ability to predict when that event will occur in a specific point of time, say the fifth time where n ranges from 1 to 10.
Machine learning is understood by everyone as a branch of knowledge that deals with an application of artificial intelligence. It enables the systems to automatically learn and improve outcome from the experience in the past without being explicitly programmed. We will discuss the sanctity of this understanding in detail in a short while.
Sign up for your weekly dose of what's up in emerging technology.
First, let us understand whether probability and predictability are related to each other. The answer is yes, and statistics make use of probability distributions for the predictability. The probability theory includes discrete and continuous random variables as the outcome of the event is random in nature.
The probability of any event is associated with the success rate in the context of likelihood of future events which decide the confidence level, but the element of getting unsuccessful in the experiment also goes hand in hand. In order to overcome this ambiguity, predictability comes into picture which uses statistics for analyzing the frequency of past successful and unsuccessful events. The correct decision-making process largely depends upon the human bias and accuracy of this predictability notwithstanding the inherent risk involved in predictability. The statistical techniques used for different types of Machine Learning (supervised, unsupervised and reinforcement) to address classification, association, clustering and object detection problems are widely available on the internet; but the experience and aptitude to use them will require statistical knowledge and learning ability to make sustainable decisions.
Let us come back to the earlier discussion whether machine learning allows systems to learn automatically and improve on its own. In this context, it is pertinent to understand that predictability depends upon the series of events that occurred in the past. Machine learning is not something which can help you in making predictions without past data in place and, moreover, it is about learning the patterns and outliers in the data. The ML models require continuous retraining for improving the predictability as they become less precise over a period.
An important point of discussion over here is about the biases involved in ML models. The biases involved as a result of datasets selected by the humans will pass on to the ML algorithms and the processing will be done accordingly. Hence, in order to deal with such biases, we should understand that algorithms are objective as compared to humans, but that does not make them fair, it just makes them objectively discriminatory.
The objective of designing an ML model should be to solve the optimization problem. It helps in finding the best solution from all the feasible solutions. If the data input is around all the feasible causes and not clearly visible unreasonable cases, the model built will be more robust and sustainable. Hence, it is the human-in-the-loop along with machines who are responsible for training and retraining the ML models for removing bias in the outcome. In order to make the predictions more accurate, a proper data selection approach needs to be devised for the desired outcome of an event or process. Feature selection is a widely used process for improving the accuracy and performance of the ML model by selecting those features which contribute most to the above mentioned random or prediction variable. Testing and validating ML models for accuracy and deployment will have to be oriented towards enhancing the decision-making abilities of humans and not machines.
ML models are stochastic and not deterministic in nature. This implies that the randomness will always be there in the process, but the random variable will have to be measured in relation to the measurable function. Since there is an element of uncertainty involved, these models are better understood when the statistical analysis of past events forms a pattern. It becomes difficult when the data collected from the past events are highly sparse in nature (mostly zeros) and the ML model is required to learn and predict the outcome. In such cases, it is relevant to use Reinforcement algorithms which usually learn optimal actions through trial and error.
The type of training data set will decide the type of ML algorithm to be used and the outcome will vary if a different algorithm is used on the same data set. It is also recommended to keep in mind that ML models should be built where the probability of an event is highly unlikely. If an event is occurring daily (one observation/day), which means the probability is 100%, then there is no need for building an ML model for the predictability. On the other hand, if there is a probability of storm based on the past occurrences but there is no or weak predictability when that storm will actually come, the underlying ML model can greatly help to predict the date and time of occurrence for taking possible preventive actions to minimize the damage.
The predictability offered by the automated ML models enables the humans to derive meaningful insights from the data and enhances the decision-making capabilities which were earlier limited due to the conditional fixed rules. It is not the probability which needs to be improved by these ML models, but the ability to predict such probability that needs to be improved for taking action-oriented decisions. Artificially intelligent systems built on these ML models will help in addressing various industry use cases, such as critical health-related problems, predicting and preventing cyber-attacks, performing sentimental analysis, predicting financial crimes, etc. But a continuous critique is required so that these models do not become stale over a period due to insufficient data or lack of predictability.
Views expressed in this article are my own and may not necessarily be of my employer.