In the developer series Behind The Code, we reach out to the developers from the community to gain insights on how their journey started in data science, what are the tools and skillset they use and what’s essential for their day-to-day operation. For this week’s column, Analytics India Magazine got in touch with Bishwarup Bhattacharjee, Principal Data Scientist at Here Technologies, the Open Location Platform company.
21st Oct, Bangalore
Bhattacharjee’s work revolves around multiple Computer Vision which is intended at making the map better and rich in contents and enabling real-time assistance and feedback to help navigation and ensure safety for the vehicles around the world.
Bhattacharjee studied Statistics and started working as a freelance data scientist in the year 2009 with multiple small and medium-sized companies. He reached a great milestone in terms of becoming a grandmaster and securing a worldwide rank of #9 at one point at Kaggle. He also worked on a project of Organisation for Economic Co-operation and Development (OECD) for a couple of years. With a passion to do more analytical work, Bhattacharjee made a transition from a freelance data scientist to a full-time data scientist.
Coming from a statistical background, Bhattacharjee faced some challenges in coding. He overcame this by spending a significant amount of time to learn more than one programming language. He learned advanced techniques like processing large datasets, visualizing them to look for any potential indicators, building advanced models, for example, XgBoost, LightGbm and CatBoost, validating the models to avoid overfitting and many other.
Code, Toolkit & Skillsets
At Here Technologies, Bhattacharjee leverage most of the state-of-the-art models in image classification, object localisation and detection, OCR and segmentation field. Started the journey with R language, later he switched to Python to work on neural network models.
In terms of frameworks, Bhattacharjee used Keras (first with Theano backend and then with Tensorflow). He also uses cloud servers such as AWS, GCP, and PaperSpace.
Asking about the most preferred programming languages, Bhattacharjee replied that R and Python are the two most popular ones as both have their advantages and disadvantages but those are only relatable at an advanced level. Python is easy to learn new stuff, efficient, go-to language for ML at this point, support for almost all ML libraries, very active community, neat documentation, growing further rapidly. He added, “People also use Scala, C++, Java – there’s no good or bad about picking a language as long as you can get the work done in any one of them.”
Bhattacharjee deals with tabular data by implementing models like XgBoost, LightGBM, and CatBoost as they offer the best accuracy in their class and they are also pretty fast to run on large datasets especially lightGBM.
Bhattacharjee learned most of the things from Kaggle, StackOverflow and Google forums. He usually follows the top to bottom approach. Recently, he explored one-shot learning in more detail for one of the projects of his organisation. One-shot learning, more specifically Siamese Network has been widely used in the past for face recognition tasks and content-based image retrieval (CBIR). He has also been looking into Deep Reinforcement Learning.
Some of the suggested articles by him are mentioned below.
- CS229 – Stanford (Machine Learning)
- CS231N – Stanford (Deep Learning and Computer Vision)
- CS224N – Stanford (NLP)
- 3blue1brown – Great explanation of mathematical theories, easy to grasp complex concepts.
- Kaggle Reading Group – Rachel kind of explains the latest research papers and goes hands-on with it to some level.
- Coursera – DeepLearning.ai (Andrew NG): Fantastic course, implements a lot of fundamental stuff in numpy from scratch – helps your understanding of the things that go behind mainstream neural network architectures.
- Udacity – Deep learning Nanodegree: I’ve heard it very hands-on and goes in fair bit of depth to explain the concepts.
- Udacity – Self Driven Car Engineer Nanodegree: it strikes a perfect balance between computer vision and robotics.
- Fast.ai – Jeremy gives you practical ideas around fundamental concepts of deep learning, implements them in the library they created which is called fastAI, built on top of pyTorch.
Word of Advice
Asking about his word of advice to the millennials, Bhattacharjee said, “I would say that you shouldn’t spend too much time thinking about where to start, how to start, what to learn first and so on – just grab a project from Kaggle, MachineHack or any other websites that you find interesting, download the dataset and start coding.” He added, “For any data science challenge in Kaggle, there are a plethora of public scripts available in the forums – follow the codes and start coding on your own – don’t just copy and paste to get things done quickly. You would also need understanding of linear algebra and calculus at some level but if you start studying those without the context of a particular problem, a real-life problem – you might easily get bored. ”
What The Future Holds
According to him, the tools and techniques that we use today might morph into something else, but AI will thrive. Healthcare is one of those areas where people can see a lot of benefits from AI in the near future. A lot of ML techniques are being used to diagnose advanced diseases already, but ML can also significantly help in preventing those diseases in the future by analysing different health markers over time.
Talking about personal endeavours, Bhattacharjee said, “Apart from my day to day job, I plan to invest a majority of my time into Computer Vision and Deep Learning. I also plan to dive a bit deeper into Reinforcement Learning and like to play real-time strategy (RTS) games when I get the time and I also want to write an AI bot that can outperform me consistently in one of those games.”