For this week’s ML practitioner’s series, Analytics India Magazine(AIM) got in touch with Théo Viel, a Kaggle GM and a deep learning researcher at a French Startup called Damae Medical. In this interview, Théo shares his rich experiences from an impressive career in the world of algorithms.
AIM: How did your fascination with algorithms begin and what does a typical day look like for a deep learning researcher?
Théo: I recently graduated from a French Engineering school at which I studied applied Mathematics and Computer Science. I did the Ecole des Ponts ParisTech which is a Grande Ecole: it is a great school to do if you aspire to be a Data Scientist in France. I was introduced to Machine Learning at school, but mostly got into it during my internships, and doing Kaggle competitions.
I currently work as a Research Scientist at a French Startup called Damae Medical. Damae develops an optical device that allows you to see the inside of your skin, and I build deep learning models on top of what it sees.
My job mostly consists of the following things:
- Building high performance Deep Learning pipelines for various tasks: classification, segmentation, style transfer etc.
- Reading papers to keep up to date with the state of the art.
- Discussing technical issues with the Data Science team.
- Talking with the Clinical Team helps for a better understanding of the challenges.
- Communicating progress & results to everybody is a must.
AIM:What were the initial challenges? How did you address them?
Théo: I think the biggest challenge I encountered was how to make myself stand out. When I first looked for internships about two years ago, I realised that I had difficulties to differentiate myself from the hundreds of other aspiring Data Scientists. The name of my school was almost the only thing I had on my CV that could make me stand out, and I was not comfortable with that. I also had little confidence in my abilities, as do a lot of undergraduates, as I lacked experience.
My first two internships went well which helped me a lot, and It was also during this period that I learnt about Kaggle. I took part in a lot of competitions, as I wanted to be able to tackle a wide range of problems. In the end, my strong results in Kaggle competitions helped me gain experience, confidence, and something uncommon to put in my CV. There are only three competitions Grandmasters in France for now, and I’m very glad to be one of them.
AIM: Can you talk about your Kaggle journey and the community?
Théo: The first competition I really invested time in was the Quora Insincere Questions Classification one, which took place two years ago. I joined it because it was close to my internship topic. I was still very inexperienced at the time, and learnt PyTorch as I wanted to get reproducible results. I finished 26th which I really was happy about, and made me want to keep improving.
I got my first gold medal during the Freesound Audio Tagging competition. The key for this one was teaming-up with new people. I was stuck in the silver zone before merging with a team in the same situation as I was, and with combined efforts, we were able to grind up to the 9th spot.
At that time, I was still a bit sloppy when approaching problems. In fact, one of the keys to be consistent in Kaggle competitions is to be quickly able to set-up a clean pipeline in order to have more time to experiment. I started becoming more consistent with my approach, which can be simplified to this short list:
Download our Mobile App
- I reuse an already existing Deep Learning pipeline I have and adapt it to the problem.
- I start with a baseline architecture and tweak some parameters (epochs, learning rate, etc) to have a robust set-up to start experimenting.
- I start with basic experiments (eg, try different model architectures), before moving to fancier stuff, always making sure my cross-validation is robust.
- The final things I do are trying to find a way to artificially boost the results with post-processing, and add diversity to my models to build a robust ensemble.
This proved to be particularly efficient in early 2020, at the time I was doing a lot of NLP competitions. My team ended up 10th in the Google QUEST Q&A Labeling competition, and I re-used a lot of acquired knowledge in order to win the Tweet Sentiment Extraction one.
I had managed to snatch a solo gold in the Generative Dog Image competitions, so I was one gold medal away from Grandmaster. Fortunately, Kaggle hosted an Audio competition. My journey to GM ended the same way as it started, with a gold medal in an audio competition. What really impressed me was how much the level of top solutions improves as time passes, you really need to improve a lot to stay at the top.
AIM: Looking beyond the hype, what ML techniques, use cases, and applications do you think will stand the test of time?
Théo: Machine Learning is an ensemble of mathematical techniques that has proven to be powerful to tackle a lot of applications. If you are using Machine Learning in your company, I think the key to stand the test of time is a good understanding of what you are doing. Mostly, one has to know the weaknesses of its models, what is doable and what is not doable.
Because a lot of people only consider AI as an opportunity to make money, the market is flooded with people that do not have a clue what Machine Learning really is.
If you have a startup and investors stop believing in what you’re saying at one point, because they realise you pitched something you are not capable of doing, then you’re out. And I know for sure there are hundreds, if not thousands of startups, that claim to be doing AI but aren’t doing anything, which won’t stand the test of time.
AIM:What does your ML toolstack look like and can you share any learning resources for beginners?
Théo: I do almost everything using VS Code remote SSH: I have two machines with a 2080Ti (one for work and one for Kaggle), which I access using my laptop.
I consider my tool stack to be relatively simple: I use PyTorch, which is more convenient for research than Tensorflow, with no fancy stuff on top of it. A few libraries I use are Qubvel’s Segmentation Models, PyTorch and Huggingface’s Transformers. They allow to quickly load pretrained architectures and are really convenient to use, in their respective domains (image segmentation & NLP).
Regarding learning resources, I think any course available online that is given by a renowned researcher or/and at a top university is fine. I followed David Silver’s Reinforcement Learning course which is great.