MITB Banner

How Does Machine Learning Handle Ambiguity?

Share

ambiguity-bn
(Image source: @yeowatzup/Flickr)

In the world of machine learning and artificial intelligence, every unique real-world problem encountered has its own implications and perils. Despite all the efficient techniques, it is very hard to preempt simple factors such as ‘uncertainty’ at times. For example, in image classification, if the image features in the data are not accounted for in detail, the output in the system will be vague, even if the learning algorithms classify them accordingly.

This is just the tip of the iceberg when it comes to ambiguity in ML. Even though ML systems are designed meticulously, sometimes it comes across new, uncertain problems. The uncertainty may lie in any part of ML — be it in its goals or in the data it receives. These factors lead to open interpretation. In this article, we will look at a few cases where the ML has handled ambiguity in the most appropriate manner.

Case 1: Natural Language Processing

One of the earliest investigations in ambiguity with ML was regarding the development of natural language tasks accurately, where the algorithms were made to act on the linear separators in the feature space. This was to resolve semantic as well as syntactic errors present in the language processed by the algorithms. In a study by Dan Roth, a professor at the University of Pennsylvania, US, he presents a learning approach in which linear separators are used to resolve language ambiguity.

The study focuses on linguistic aspects such as word choice for machine translation, parts of speech-tagging and word-sense disambiguation. The study’s research paper considers the language learning process as a disambiguation problem and applies the linear separator technique. A formal definition of the disambiguation problem is defined in terms such as different word predicates, their classifications and features for the learning problem. In addition, various disambiguation methods are also emphasised for using them as linear separators.

The linear separator method mentioned in the study did perform well compared to other methods such as Naive-Bayes and Transformation-Based Learning (TBL), thereby giving a better alternative for ambiguities in natural language.

Case 2: DNA Sequencing

The advances in genomics are so swift that it has generated loads of possible data for sequencing process. Sequencing is the process of arranging nucleotides in a DNA in order to ascertain genetic information. Although there are machines which analyse sequencing in quicker times. A novel machine called Ibis (improved base identification system) was developed by Max Planck Institute for Evolutionary Anthropology in Germany, to work with Illumina, an analyser which uses fluorescence for sequencing DNA bases (the process is called ‘base calling’)

The system utilises ML and statistical methods such as clustering and support vector machines (SVM). It mainly improves the base calling process by learning the intensities (strength) of the bases in millions of DNA molecules. The intensities are labelled in the ML process. The ambiguity lies with intensities of the bases where the whole process of sequencing may be invalid if they are wrongly interpreted, or if they are not captured correctly all along the process. Ibis tackles this by making sure that the intensity levels are captured perfectly. Hence, it uses multiclass SVMs for this to achieve.

Case 3: Image Classification To Recognise Words a.k.a Visual Words

One of the most challenging problems in ML is the use of verbal descriptions for image classification (such as colour or a feature), which lead to many interpretations. Words expressing visual depictions are usually not accounted for techniques in ML such as image classification since it should consider both the image as well as textual features. It leads to a large amount of data where it may further be complicated for classification. Although there have been studies that have taken both text and image into account for training ‘visual words’, these rely on the best possible definition of each word for each visual depiction.

One such study that has alleviated this problem was by researchers at the University of Amsterdam where they devised a ‘codebook’ which contains a vocabulary for generic words mapped to image features through ML. The researchers test these on five datasets and find that the image-word matching is significantly better.

Comments

The few cases mentioned above has covered only the text aspect of ML. Just like this, ML encompasses a host of different data such as images, videos, codes etc. Ambiguity will only be less if more quality data is incorporated. In addition, the goal of the ML idealised should be precise and in tandem with the requirements of the ML project in the picture.

Share
Picture of Abhishek Sharma

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.