“There are too many ML resources out there and so many people suffer from decision paralysis.”
For this week’s ML practitioner’s series, we got in touch with AI Epiphany founder, Aleksa Gordić. He is a machine learning engineer from Serbia and works at Microsoft. In this interview, he shares some excerpts from his vibrant journey in the world of data science.
AIM: How did your journey in machine learning begin?
Aleksa: I was inquisitive since I was a kid. I loved biology, genetics, cosmos and watching science fiction. But my exposure to the tech world started pretty late in my life. Unfortunately, as I was growing up, nobody in my environment was involved with programming, AI or tech in general. That’s the case with many people in today’s world – which gives me this unique ability to empathise a lot more with people who are just starting off.
Having said that, I was always passionate about algorithmic thinking, although I only got exposed to programming when I was 19. And those first programming days feel like I was raised in the 80s—coding on paper and writing code in PASCAL.
I like to think about my education as having two separate but related threads. On one side, I have my formal education, and on the other side, I’ve constantly been working on improving myself in my free time since I was at least 14 years old. I have studied everything from:
- Hardware – both analogue and digital electronics. I constructed 8-bit digital dividers in bare metal and silicon.
- Low-level programming like bare metal uC embedded programming (C), RTOS programming for uCs (C), FPGA programming (VHDL).
- Higher-level programming languages, mainly C++ and Java and
- Also lots of mathematics, digital image processing, etc.
I started doing ML 2.5 years ago when I attended Microsoft’s ML summer camp, but due to my solid foundations in electronics, CS, Maths and algorithms and a lot of hard work, I managed to catch up with the best people in the field. I thought that starting later was my weakness, but it became my strength as I have a more diverse background than others.
Over the last couple of years, my main area of focus has been on machine learning/deep learning with an emphasis on computer vision. But throughout 2020, I’ve been exploring various other areas of DL like NLP (transformers), graph neural networks, etc.
Aleksa: Mostly balancing between my full-time work at Microsoft and learning on my own in my free time – which doesn’t leave me with a lot of spare time and burnouts become a real issue. It took me a lot of time to find that sweet spot where I’m super productive but also sustainable over the long run.
AIM: What were the challenges and how did you address them?
The fact that I was born in Serbia automatically gave me fewer opportunities than my peers who were born in Silicon Valley and who were exposed to tech since they were five years old.
Hard work and a passion for learning and success were the key drivers for me. The Coursera course “Learning how to learn” helped me find the work-life balance that I needed so desperately.
Nowadays, I don’t treat the work I do as a separate entity from my life – it is my life. I also take care to keep my social relationships and my body healthy.
AIM: Can you tell us about your role at Microsoft?
Aleksa: I currently work as a Machine Learning engineer at Microsoft. My job depends on the kind of project I’m working on — ranging from ideation, research, shipping and more. My role is a mix of research and engineering. A fun fact is that I officially got employed as a software engineer but due to my serious investments in ML, my efforts got recognised internally, and I got shifted to ML-based projects.
I’m involved in data collection, data engineering, visualisations and all the way to training and measuring the compute budget of different computer vision models, and communicating that with the team. As another example, I was also developing the metrics pipeline from scratch, monitoring labelling pipelines and developing scripts to robustify them.
Even though I’m in a big corporation, it feels a lot more like a startup. Earlier this year, I took ~3 months to reconstruct one deep learning paper. I was communicating my progress with the broader team, and I learned a lot about research along the way.
Sometimes, depending on the project phase, I also present papers on internal reading groups with Cambridge and Zurich teams. We collaborate a lot with Cambridge, and most of them have had some previous MS research experience.
AIM: Can you talk a bit about the tools you use as an ML engineer?
Aleksa: Python/PyTorch/AML is what I mostly use, and it does the job, so why change it. Python has a big ecosystem built around it so you can find a library for pretty much anything.
In my first year of work at Microsoft, I was extensively using C++ but not anymore. Nowadays, I use C++ only occasionally when I’m working close to hardware and am trying to optimise the Python code. We work on cutting edge devices in my team (like Microsoft HoloLens), so that’s something that pops up from time to time. Given my solid background in electronics I don’t have any problems getting my hands dirty (although I never did truly low-level programming while at Microsoft).
“I’m a big believer of not losing your precious time by learning a bunch of frameworks and languages.”
PyTorch is my framework of choice for now, but I’d like to explore JAX in my free time. I think it’s a suboptimal strategy to “learn the framework first” via some course. Just go out there, get your hands dirty, and develop something and learn on the fly.
I did some projects in Keras/TF as well, but I find PyTorch much nicer. As Karpathy nicely put it in his tweet:
It takes me a maximum of two weeks to learn a new programming language — even those which have a very different paradigm like Haskell (functional paradigm – I love it I must say).
Same goes for cloud providers — all of the biggest ones will do the job — AML, GCP, AWS, whatever makes the most sense for your context/company.
AIM: What does it take to make a machine learning engineer from good to great?
Aleksa: Make, make, make, code, code, code. I’m bullish on my GitHub portfolio. I like creating my own projects and sharing them with the world. There are many benefits to doing that:
- You learn a lot
- You help others
- People hear about you
Good engineers focus on learning frameworks and packages (should I learn NumPy or pandas?), figuring out which course is the best one (should I do this Udacity course or Coursera?) or which book they should read next. The great ones care about solving the problem and see all of those as just the tools. Problem-solving skill is much more critical than learning 100 DL frameworks in parallel.
“If you’re still reading books and doing courses still – you’re at an intermediate level at best.”
It takes a lot of patience and time! You can’t learn ML in 3 months or even a year. It takes a lot of hard work, and it also takes making right decisions along the way, i.e. knowing where to invest your time into and what to ignore. There are too many ML resources out there, and so many people have decision paralysis.
Also, I see way too many “advanced” practitioners, as well as ML influencers, spending too much time reading books and doing courses and sharing how they’ve completed some new course on LinkedIn.
The only way to keep up with the field is by regularly reading research papers and surrounding yourself, both online and in real life if possible, with best researchers and engineers from the field. LinkedIn and Twitter are good platforms for that.
AIM: Can you name some resources that helped you and could help others in their journey?
Aleksa: Here are a few resources:
- For Calculus: 3Blue1Brown’s YouTube playlist (amazing channel in general!)
- For Linear algebra: Linear algebra course on MIT by Professor Gilbert Strang.
- For Python, good old Stack Overflow would do.
- On Learning
- On NLP
It’s sometimes useful to take a step back and learn how to learn. Many people suffer from suboptimal learning strategies. Like this this Coursera course “Learning How To Learn” helped me tremendeously.
I create a lot of content myself. I also did a YouTube series on neural style transfer, deep dream, I’ve open-sourced many projects like this one on GANs. I have covered transformers extensively on my channel, and again I’ve open-sourced a project here.
I’m a big believer of coding up a project from scratch once you gain some solid theoretical understanding of the field. It’s hard to explain just how much I’ve learned in 2020. I think it’s probably close to ¾ of a PhD in units of effort!!
(Here is a great blog by Aleksa where he talks more about starting the ML journey.)
AIM: What is in store for machine learning in the coming decade?
Aleksa: Deep learning is still a young field, and many problems are not yet solved – so it will take a lot of time before that knowledge ends up getting distilled into thick books. What’s specific to ML is that people believe that the systems that we engineer today are truly intelligent. The outside world is prone to anthropomorphising the tech that we create – so it’s easy to spark much fake news around it. We’ve seen this happening a lot. The Sophia robot from Hanson Robotics is a good example.
I don’t believe that a single area of AI will win out any time soon. But, we will see a combination of Bayesian learning (for better probability modelling and hopefully increased model interpretability), graph neural networks (for finding good representations in knowledge graphs, etc.), reinforcement learning (for learning when you don’t have differentiable functions), smart attention methods (transformers, GATs, etc.).
I also think that causation will play an important role. I still don’t have enough knowledge to further argue about causational frameworks, but that’s just my gut feeling.
Deep learning is great at modelling perception, so I don’t think it will go away anytime soon. The bubble may boom and bust, but it’s here to stay. I like to think about deep learning as our best model of the cerebellum and different parts of the brain that care about perception (visual cortex, auditory perception, etc.). Those parts are characterised by the lack of consciousness and by a vast amount of computation happening inside them.
On the other hand, we still need to improve systems that care about the equation’s cognition part. It’s not entirely impossible to imagine that even “obsolete” symbolic AI will have its role in developing the “true” intelligence of tomorrow. I also expect graph neural networks to play their part here in modelling knowledge, memory, etc.
Finally, overly complicated heuristics and systems which have way too much human knowledge integrated into them will go away. “The bitter lessons” by Sutton is a nice read on that topic.