The Machine Learning Developer Summit (MLDS) by Analytics India Magazine is India’s leading conference for machine learning practitioners.
From February 11 to 13, the third edition of MLDS virtually hosted top academics and industry professionals from the data science community. Tech talks, paper presentations, and workshops covered a multitude of topics in machine learning and data science.
Here, we give you a sneak peek at the top ten papers presented at the conference.
Predicting Demand Offset To React To Unforeseen Critical Events — Priyanka Telang
The supply chain depends a lot on actionable insights from demand forecasting. The data-driven insights help organisations allocate resources judiciously. The forecasting models factor in historic demands to make sensible projections. However, such models are not equipped to account for black swan events like a pandemic breakout, given the limitations of their parameters. This paper tried to address this issue.
Telang has developed a method to continuously monitor the external events in real-time, stream, cluster them and then arrive at an offset in the demand. The demand will be based on what was seen in the past and the context of occurrence of the events, like the point in the sales cycle when the event is occurring. This helps increase the reliability of the demand forecasts to react to unforeseen events effectively.
ML-Based High-Cardinality Reduction Methods To Create Geo-Score To Improve Auto Insurance Tweedie Pricing Model — Suguna Jayaraj
Though traditional auto insurance pricing models consider risk factors like driver, vehicle, and policy characteristics, geographies largely stayed out of the equation owing to their high cardinality. While postal codes can be used for the same, this can present problems since loss in cost is almost always zero in case there is low exposure in certain areas and the model has low confidence due to the information lost on latent variables.
This paper presented a case study where a geo-score was developed at a postal code level to improve risk segmentation. The base loss cost model was built using Tweedie Compound Poisson regression. Geospatial attributes were added to the model without changing the existing rating structure. Reducing the high cardinality features of geographic data and further including socio-demographic variables, the paper presented a hybrid approach of a target-based encoding method.
Solution Approach To Resolving Vehicle Routing Problem Using Deep Reinforcement Learning — Dr Monika Singh
Vehicle routing entails finding the optimal route to improve ETAs. The paper showed how deep reinforcement learning can be used to solve this classic optimisation problem. Rather than being explicitly programmed, a dynamic attention model was developed with an encoder-decoder architecture where each node is dynamically characterised in the context of the graph.
The algorithm was benchmarked against the genetic algorithms and was evaluated using two KPIs — travelling cost and computational time. According to Dr Singh, the comparison showed a 5X-20X reduction in cost and a 100X–1000X reduction in computational time.
Pneumothorax Detection And Classification On Chest Radiographs Using AI — Tejas Haritsa V K
Pneumothorax refers to an abnormal collection of air in the pleural space between the lung and the chest wall leading to a partial or complete lung collapse. Radiography is being used for diagnosis of various diseases and can help in timely detection of Pneumothermax.
The paper analysed the performance of AI in detecting pneumothorax by training the model on two different datasets. Two AI systems were evaluated using the high-resolution complete images and the other involved providing medium resolution images in segments. The paper laid out the performance metrics and limitations of both methods. The segmented approach showed an accuracy of 96.83% while the full image showed an accuracy of 95.10%.
Modified Count Difference Feature Selection Method For Text Classification — Manik Garg
The lifecycle of text classification models includes inputting data, preprocessing data, extracting features, selecting features, and then applying the machine learning classifiers. There are various methods for the feature selection, and one of them is the Count Difference Method (CDM), which, however, has various limitations. The method is restricted to binary classification as it can measure the difference in a particular feature across two different classes. Also, huge datasets are a time sink.
Garg proposed a method he developed with his coauthors to overcome these challenges and extend CDM to multi-class classification. This was done by comparing the relevance of a feature in the present class and calculating if it is prominent across classes using the ‘centred maximum value’. The experiment is scalable, simple to implement, and computationally fast.
Efficient And Optimal Deep Learning Inference For Computer Vision Applications — Venkatesh Wadawadagi
‘Optimal deep learning inferencing’ can help process data in real-time and ensure faster model validation. It also helps overcome limited computational, storage and memory resources. Otherwise, hardware becomes a bottleneck on applications run locally and operational costs soar for applications running on-the-cloud.
Any device using computer vision will need to tackle these limitations for maintaining a healthy balance between model performance or accuracy and inference time. Wadawadagi presented a paper on various approaches to optimise deep learning inferences. He covered different methods including quantisation techniques, network pruning, matrix factorisation, and inference accelerator. The presentation covered the advantages and tradeoffs of selecting different approaches.
Telecom Churn And Valued Customer Retention — Srinivasarao Vallaru
The telecom industry is a highly competitive market and consumer retention is the lifeblood of their business. The paper proposed a progressive analytical approach that could help segment customers on the basis of factors like churn severity and churn priority. This can identify risk levels of customers churning from low to high, which then can be used to take retention measures and, in turn, increase their revenue.
Predicting Missing Product Taxonomy In Retail: An Embedded Approach Using N-gram Mixture Models And Newton’s Method — Neeraj Mishra
In retail, taxonomy is a hierarchical and logical arrangement of products to help customers find what they need in the store or website. The process is expensive as it needs human resources including taxonomists, information scientists, and linguists to build an effective taxonomy.
Mishra has developed a novel machine-learning algorithm by leveraging N-gram Mixture Model, cross-entropy function, and Newton’s optimisation method to set up a product taxonomy. A modified Naïve Bayes algorithm and up to 4-gram models were combined with general heuristics to produce a model that was deployed and tested for online retail data. It achieved an 84% accuracy.
Machine Learning Approach To Predict Patient Position For Preventing Bedsores — Sujoy De and Aditya Agarwal
The cost of patients developing bedsores is very high, with the US alone spending more than $9.11 billion annually in treating around 2.5 million individuals. Currently, there are smart beds that alert patients who have not changed their position. However, these devices are expensive.
De and Agarwal came up with a solution that uses low-cost load-cells to accurately estimate the patient position with an accuracy of 98.8%. They used various feature engineering models to distinguish one position from another. This helped them generate meaningful intuitive features that are used by various machine learning models to generate alerts when a patient has been in the same position for a prolonged period of time.
A Noninvasive Model To Detect Dengue Based On Symptoms Using Artificial Intelligence and Machine Learning — Dr Ruban S
AI is significantly impacting the patient care system. Dr Ruban, along with his team, has developed a non-invasive model to detect dengue based on symptoms using AI and machine learning techniques. This was developed using the data of hospitals in the rural areas of coastal Karnataka. Trying different machine learning approaches along with balancing datasets using oversampling techniques, the model was able to provide an insight into different symptoms to predict dengue with an accuracy of 98%. The team plans to improve the model by incorporating data across multiple geographies.