For this week’s ML practitioner’s series, Analytics India Magazine got in touch with Darragh Hanley. Darragh is a Kaggle grandmaster and is currently one of the 150 GMs across the world. He is currently an AI engineer at a healthcare company, Optum, and also lectures at UC Berkeley.
Darragh has a bachelor’s degree in Math and Engineering from the Trinity College, Dublin, which has educated some of the famous poets, playwrights and authors, including Oscar Wilde and Jonathan Swift. Adding to his diverse repertoire, he has also taken programs in Computer Science and Data Mining at Georgia Tech and Stanford.
Though Darragh’s academics have set him up to become what he is today, the turning point in his career came when he learnt about the potential of machine learning algorithms in the ImageNet challenge back in 2013. The fact that a piece of code can be made to recognise images as good as a human intrigued him, and there is no looking back from thereon.
Today, Darragh works as an AI Engineer at Optum, a healthcare company that excels in delivering transformational health solutions. He also teaches a Masters course in UC Berkeley’s MIDS program, which he considers to be as much of a learning experience to him personally as it is for the students. Emphasising on the key roles that his students play, Darragh spoke about how a module presented by one of the students helped him in working on the DeepFake Detection Challenge on Kaggle.
I became interested in AI soon after Deep Learning was used to win the ImageNet challenge for the first time in 2013.
Talking about Kaggle, he explained how the Kaggle experience has come in handy in his own professional work. For example, at Optum, his team processes five billion claims annually, of which one billion are received as faxed images a year.
For instance, his team collaborated with Siddharth Garg, Arbind Singhi, Jyoti Nahata of Optum Gurugram, India to create macro-level visibility on how COVID-19 is impacting US healthcare facilities and patients.
The kind of data he gets to work on is a dream come true for any AI engineer and the type of challenges that Kaggle presents, gives the AI engineers a great opportunity to shine their ML tools.
Achieving top 10 seemed like an insurmountable challenge
Darragh’s first Kaggle challenge was the Higgs Boson identification in which he played around with sample scripts on the forums and gradually proceeded into learning about Xgboost, glmnet and a number of other techniques in great detail.
On Kaggle, Darragh is now a grandmaster in competitions, which requires one to be in the top 1% in multiple challenges. However, he admits that he found it to be an insurmountable challenge during the initial days. But, in his second contest on Crowdflower Search Results Relevant, he and his team of rookies made it to the top ten.
When asked about what it takes to get to the top, Darragh quoted one of his favourites by Jeremy Howard — “The best practitioners I know in machine learning all share one particular trait in common, which is they’re very, very tenacious, and are proficient coders. They are exceptional at turning their ideas into code.”
The best practitioners in machine learning are very, very tenacious
Darragh puts the tenacity into practice by investing more time in learning a new technology, getting good at it and then quickly moving onto new approaches. And, he urges the newcomers to do the same; to challenge, to change approaches, and be an expert at something.
Tools Of A Grand Master
Darragh’s Kaggle ritually typically consists of starting out by taking an existing pipeline from the forums in order to understand the data and the metric.
Once he gets a decent grasp of the data, he moves on to spend a lot of time reading the existing approaches, devouring literature, successful solutions and skimming through GitHub.
When it comes to hands-on, Darragh mainly leverages AWS and prototypes on his Macbook Pro, which he believes is enough to check if a pipeline is working well before deploying. He usually runs code off the command line and Spyder IDE.
Regarding the frameworks, Darragh has expressed his liking for PyTorch over other frameworks for the kind of freedom it offers to experiment compared to others.
Given the kind of exposure to new techniques that he gets through Kaggle, Darragh is also keen about giving it back to the community. So, he publishes his solutions in a lucid manner for others to access what a winning solution looks like. One such effort was his very informative write up on the RSNA Intracranial Hemorrhage Detection Challenge solution.
As much as he enjoys working with really big data, learning how to build efficient pipelines is as much fun as the modelling for him, he hopes to see more challenges that focus on data science for good competitions like medical imaging and wildlife conservation.
A Note To The Aspirants From An Insider
The unprecedented hype around the field of data science might be overwhelming for a newcomer and sometimes intimidating as well. Few insiders might even fool themselves into thinking that AI is the cure for all ills. Touching upon this, Darragh is puzzled that many people dream to flourish at AI just because Google or Stanford is doing well. A common trait among companies who get the most out of AI, says Darragh, is because of their leaders’ knack for applying algorithms.
AI is still at a point where it is overpromised and underdelivered within most organisations.
Darragh firmly believes that AI is not a replacement for lack of customer engagement, domain knowledge, solid data engineering, efficient code and rigorous testing.
Explaining how data science rookies run into rabbit holes, Darragh shared one of his own experiences of working in an analytics-based healthcare company. He and his team are currently looking to leverage vast volumes of data in hospital records to address recent COVID outbreaks. In preparation for this problem, they are asking questions related to the availability of healthcare staff and ventilators, prior diagnosis in patients and if it makes them more susceptible to a severe manifestation of the virus.
Added to this are the challenges of privacy and the ethical usage of a patient’s data, which becomes the top priority while working with cloud services. It is the responsibility of an ML engineer to verify if the deployment on cloud complies with regulations of the government, which in this case is the U.S. Health Insurance Portability and Accountability Act (HIPAA).
The challenges in the real world get more complex, such as in the critical use cases like the above. So, Kaggle success should not be mistaken for expertise at the industry level.
While Kaggle helps one learn how to approach problems, assures Darragh, working in the industry helps learn what questions to answer in the first place because once you have the right questions and the right data, most often simple algorithms are sufficient to solve a problem.
Over the next 10 years, Darragh forecasts approaches such as linear modelling, decision trees, neural nets will become mainstream. And, for that to happen, traditional companies may have to re-engineer their processes from the ground up.
“Even if there were no progress in AI in the next 20 years, we would still have a revolution on our hands.”-Francois Chollet
Taking the example of healthcare again, Darragh projects that technologies related to ambient computing will be space to look out for. For example, an Apple Watch can provide a lot of promise. However, he also admits that healthcare is a very personal service, and we can never replace the great work which carers, nurses, and clinicians provide. And AI might help them remotely monitor a patient’s progress.
To this end, his team has been working over the past few years to understand how healthcare data can be recorded, structured and administered more efficiently. Beyond this, within more technically advanced organisations, Darragh believes that the significant advancements will be made in reinforcement learning.
The success of any scientific domain is deeply rooted in its replicability and verifiability, which can be only achieved through a community that is thorough with the fundamentals.
So, for the beginners, Darragh strongly recommends the famous “An Introduction to Statistical Learning (ISL)” by James, Witten, Hastie and Tibshirani.
He also suggests mixing it up with the fast.ai courses and the Stanford Deep Learning lectures, e.g. cs231n and cs224n that are all freely available on youtube.
Just download any free course and listen to them on your commute.
Besides, Darragh believes that future data scientists can gain a lot by consuming contrarian views. His personal favourites include the works of Howard Marks, Burton Malkiel, John Templeton, Daniel Kahneman. It encourages a different type of thinking.
He urges the aspirants to constantly listen and learn from others while having self-belief in one’s own abilities and ideas, and not to be intimidated.