For our weekly developer column ‘Behind The Code’, we get in touch with some of the brilliant minds from the developer community in India and try to take a look at their journey — from the way they work to the tools they use. This week, we got a chance to interact with Saurabh Choudhary, who is the Data Science Lead at Uber R&D, Bangalore. Saurabh gave us clear insight into the data science domain and has also talked about his journey in the industry.
An electrical engineering grad from the Delhi College of Engineering with an MBA in strategy and marketing from the Indian School of Business, Saurabh found his way into the data space in 2006 when BI was the hot new thing and SAS/SPSS ruled the roost.
Saurabh started his journey by working with telecom operators based out of Europe and Southeast Asia, helping them set prices for services, managing customer churn, and understand price elasticity. The interaction between data and real-world business impact fascinated Choudhary so much that he chose to formalize some of his knowledge and expand his field of knowledge.
Today, as the Data Science Lead for Uber’s Bangalore R&D centre, he helps shape Uber and the site’s strategy to grow the ability to tap into the rich data science talent ecosystem in Bangalore. Furthermore, Saurabh also works on end-to-end problems at an Uber scale.
“Data is integral to how we make decisions at an organization, and having a full-stack team across product, data science, engineering, research and design co-located within the same centre lets us take the insights we obtain that much further,” Saurabh further added.
Saurabh’s ML Preferences
While he walked us through his data science journey, we asked him about his preferences when it comes to ML frameworks, programming language, cloud platforms and ML tools. He said he started with using Matlab to run numerical optimisations. And eventually, as the open-source ecosystem evolved, he started using both R and Python. However, Saurabh settled on python in the end simply because how easy it is to read the code.
In terms of cloud platform, Saurabh has used AWS in the past, but right now at Uber, the entire data science team has its own internal tools to build and scale models.
Talking about the machine learning algorithms, Saurabh has said that at Uber, the choice to use an ML algorithm, is driven by business need. He and his team have used a smattering of algorithms such as BERT, XGB, State-space models, etc. in order to tackle problems across agent automation, customer behaviour modelling Geospatial analysis, to name a few.
“I consciously try not to have preferences for ML algorithms because once you have a hammer, everything starts looking like a nail,” Saurabh added.
Tools And Codes
Saurabh prefers to spend his time familiarizing himself with the data, especially when he is working with new sources. In this scenario, he uses simple visualization libraries such as seaborn for a lot of bivariate analysis and just to see the data. He also said that he uses tools such as pandas-profiling to generate quick summaries.
And when asked about coding language that he thinks is important he said, “personally, I believe that the language wars have become meaningless. The community has evolved dramatically over the last few years and pretty much all major languages are now converging to nearly the same place. The only real choice, in my opinion, is stylistic.”
He also emphasised on the fact that it is important to be able to think algorithmically. Also, the ability to anticipate pipeline and build towards it is key to being effective as a data scientist.
The Learning Phase
Coming to the learning phase, Saurabh said that initially as an early stage practitioner, Imposter Syndrome was real for him. And this was before MOOCs were well established and other learning resources were also thin. “There was a constant desire for using the “best” algorithm,” said Saurabh.
The data science lead has also said that the hardest part for him was to learn that there were no silver bullets. He realised that while specialized technical knowledge adds value, it does not tell how to structure a problem and connect the dots to business impact.
However, the amount of content available today is massive. He believes that someone who is considering whether to invest further into space should consider investing in the Machine Learning course by Andrew Ng (Machine Learning, Stanford). Also, for a more detailed study, Pattern Recognition by Bishop is a book he would recommend highly, especially as a reference book. “Other than that, the Linear Algebra lectures by Prof. Gilbert Strang are phenomenal,” Saurabh added.
Talking about something that he has learned recently, Saurabh said, “I’m trying to learn experimental causal inference at the moment. At Uber, we’re trying to understand the effectiveness of a few customer interventions that have low opt-in rates. Therefore, typical experimental methods like A/B tests are not very useful and we need a causal inference method to assess impact. So far it’s been a fascinating study.”
Here Are Some Pieces Of Advice For Budding Data Scientists
Start with the Basics: Understand fundamental calculus, statistics, core concepts such as gradient descent, kernels, etc.
Temper expectations: Most DS professionals spend 80% or more time just getting the data into the right shape. Then, if the problem demands it, maybe a model is created. Often, simpler solutions are better at creating business value
There is “Science” in Data Science: Experimentation is an integral part of a Data Scientist’s role. I would assert that understanding how to craft experiments and evaluate them is more important to a DS than being able to implement a Convolutional Neural Network (CNN) to detect cat faces.
Plans For The Next Few Years
As data science professional, Saurabh would like to devote some time in absorbing reinforcement learning and genetic algorithms because every day at Uber, he sees great applications for it when developing simulations.