Top 5 Papers Presented at MLDS 2023

Listen to this story

The Machine Learning Developers Summit (MLDS) 2023 concluded last week with numerous keynote sessions by industry experts. These sessions also included presentations of research papers authored by academics and professionals in the field. During the conference, the researchers presented their works and key findings before industry experts and attendees. These presented papers have been published in Lattice – The Machine Learning Journal, hosted and managed by the Association of Data Scientists (ADaSci). Here is the list of the top five papers presented during MLDS.

1. Application of Clustering for Computationally Light Short-Term Demand Forecasting

By Rohan Kumar and Parimesh Panda, Data Scientists at Genpact

This research work, presented by the team of data scientists at Genpact, aims to decrease the demand forecast model training cycles by leveraging unsupervised techniques. Their research addressed the issue of retail manufacturers in predicting customer demand for each product at superior forecast accuracy levels that requires high computational expenses.

They have used a clustering-based demand forecasting framework to identify clusters of products with similar customer purchasing behaviour. Their experimental approach utilised this framework to predict the customer demand for more than 500 dairy products for the next eight weeks. A comparative study on computational time across product-level and cluster-level model training has been presented to realise relaxation in computational costs better. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

2. Visualization techniques for the training of Empirical Deep Reinforcement Learning (DRL) agents with continuous state and action spaces

By Gaurav Adke, Senior Data Scientist at Michelin

Visualization of the reinforcement learning environment and learning dynamics of an agent is a vital step for debugging and a better understanding of the learnt policy. For environments with optimisation of real-world multidimensional spaces with continuous variables, such as optimisation of chemical process parameters, it is challenging and complex to observe agents’ behaviour with visualization. 

In his research paper, Gaurav presented a reinforcement learning agent developed to optimise the production process of rubber mix for the tyre industry. This research attempts to visualise an agent’s training and inference for high-dimensional state space problems with continuous state and action spaces. 


Download our Mobile App



3. Weighted clustering on fast sentence embeddings to determine themes from large unstructured data

By Paritosh Sinha, Senior Data Scientist at Uber

Most engineering product improvements are driven based on feedback from users and engineers. B2C products are used to target customers, send personalised communications, manage order requests, and track event-level actions and failures to improve product performance. However, the volume of failure logs and their unstructured nature often hinder the detection of underlying themes from event failures. 

This paper by Paritosh discusses a unique and highly efficient approach to tune and leverage a language model for embedding generation. Using a weighted clustering technique, the embeddings are subsequently used to group failures into auto-detectable themes. The paper has also presented distinctive methods to manage embeddings that help improve the algorithm’s performance while retaining its focus on efficiency and computation time. 

4. EthicalFL – A Federated Learning Framework with Bias Mitigation

By Shekar Ramachandran, Senior Member Technical Staff at Intel

Federated learning helps one leverage AI/ML techniques while preserving localised data privacy. However, owing to its decentralised nature, federated learning faces several optimization issues. This paper by Shekar identifies the problem of incoming network congestion concerning the Aggregator in a federated scenario and proposes a statistical significance test to address the problem. Further network optimization is done by implementing a requirement-based, request–response communication architecture to reduce unnecessary training rounds. This research also targets the infamous bias problem introduced due to label bias at the clients in a cross-device federated learning setting.

5. IntelliQSense: An intelligent, real-time Query Autocompletion Framework using GPT-2 

By Taaniya Arora, Senior Data Scientist at Crux Intelligence

Query Autocompletion (QAC) is a common feature for text-based input applications where a user’s partially-typed prefix input is completed. It has primarily been studied for applications involving search-based queries that are short sequences or phrases. 

Taaniya and her team have presented a novel approach to QAC for a question–answering system in an augmented analytics platform where queries are essentially business and analytical questions in natural language. In this research, the team has proposed an approach involving a combination of semantic search and natural language generation via beam search for completing questions. To enable generative completion in natural language and handle unseen prefixes, they have used a pre-trained distilgpt2 model that is fine-tuned for question completion tasks. In addition, they described a method to synthesise training data from limited available past queries for fine-tuning the model and generating quality results for completion.
There were 26 research papers selected for presentation during MLDS. Analytics India Magazine received an overwhelming number of research paper submissions for presentation at MLDS 2023, close to 400. The research reviewing committee selected the top 26 research papers based on the quality of the research work. All these research papers are available on the Lattice website for access.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Dr. Vaibhav Kumar
Dr. Vaibhav Kumar is a seasoned data science professional with great exposure to machine learning and deep learning. He has good exposure to research, where he has published several research papers in reputed international journals and presented papers at reputed international conferences. He has worked across industry and academia and has led many research and development projects in AI and machine learning. Along with his current role, he has also been associated with many reputed research labs and universities where he contributes as visiting researcher and professor.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges