Sruthi Sekar, data scientist at Gojek, has spoken about the ML-driven fraud prevention systems at Analytics India Magazine’s Women In AI Conference: The Rising 2022. She has been working with the GoPay Data Science team for four years and specialises in the fraud and risk domain.
Data science models play an important role in inculcating a sense of security in e-wallet transactions. In her session, Sruthi talked about the lessons learned from building an ML model to predict fraudulent logins to hijack wallets. She has also touched upon handling insufficient labels, data drift and selecting a sustainable model for monitoring metrics.
Watch all the recorded sessions of Rising 2022 here>>
Sign up for your weekly dose of what's up in emerging technology.
From creating an account to the final transaction, every stage in the user’s e-wallet journey is susceptible to fraud. To solve this challenge, ML models need to proactively monitor different stages of a customer journey.
Supervised vs unsupervised learning
Simple rule-based systems don’t work as there are way too many factors involved in the transaction process. You need a diverse and complex system to tackle different types of problems in real-time. On the other side, fraudsters are upping their game and it has become increasingly difficult to tell a genuine account from a fake one. In her experience, Sruthi said, supervised models have better predictive power than unsupervised models in fraud detection. While supervised models rely heavily on data labelling, which is time consuming and costly, unsupervised models’ performance is hampered by data imbalance and the overlapping nature of data– making it difficult for the model to differentiate between genuine and fraud cases. She said active learning and semi-supervised learning are the best approaches for fraud detection.
According to Sruthi, data scientists should put a lot of emphasis on feature generation. For proper feature generation, it is important that the data scientist understands the modus operandi of the fraudster. She also suggested use of graph based features, location based features, sequence information and use insights given by human validators. In model selection, she said tree-based models work the best.
Model decay happens when the dependency and relationship of the input data features and class variable varies gradually. There are two types of model decay:
Concept drift– The relationship between the independent and the target variables shift.
Covariate Shift–Changes in the independent variable distribution.
She suggested periodical re-training, incremental learning and feature dropping to counteract model decay.
Sruthi said looking to build a robust error-free fraud detection solution is nothing short of finding a needle in a haystack. However, continuous testing, experimenting, running diagnostics and creating data validation setups can result in high performing ML models.
Sruthi took questions from the audience after her session. “I’m a data scientist and I have worked on a model for a period of time. Later, when I leave the project and someone replaces me to work on the same model, does GO-JEK have a record of the variations introduced to the model during my time that the new data scientist can work with?”
To this, Sruthi responded: “Documentation is the way to go! The immediate point is that one must ensure that each modification of the model is documented. Usually part of the problems developed in the model doesn’t go away in an allotted period of time. Tackling some problems might need constant and continuous modifications. To prevent confusion, one must document the types of tests done by you and features tried during your tenure. Documentation is the only way with which one can properly transit the role to the next data scientist.”