Data science is a highly ill-defined field, and the professionals who end up succeeding in this field also tend to be eclectic. Today, we shed a spotlight on an eclectic data scientist who started his journey in biology and neuroscience, adapted image processing and computer vision skills, got interested in 3-D printing, then explored the love for data science and computer engineering, and ended up building state of the art systems for 3D bin packing optimisation.
3D bin packing optimisation is a combinatorial optimisation algorithm which tries to fit as many cuboids as possible into a 3-D space. The problem is subject to real-world operational constraints, data backend, API layer and 3D packing visualiser for container loading.
To understand precisely how one can undertake and if at all plan a serendipitous journey into data science, we talked to Pushkar Paranjpe, senior data scientist in SIPL’s Big Data and Data Science team popularly known as Star TV.
Pushkar started out studying chemistry and biology where he studied and researched on topics related to diagnostics of malaria infection, plant tissues and other related areas. He got selected to pursue a doctorate at the prestigious National Centre for Biological Sciences which is part of Tata Institute of Fundamental Research. There he got interested in the neuroscience of flies and specifically behavioural studies in Drosophila.
When he started his postdoctoral journey and took up a job at the Centre for Cellular and Molecular Platforms (C-CAMP) he took a liking to hardware systems design and image processing algorithms. He went on to create video segmentation algorithms and build hardware to capture insect activity.
This constant work on specific projects turned Pushkar into an expert and a quick learner. Although he thinks to do a machine learning course is good, but one needs to get their hands dirty. Pushkar says, “.. true expertise/depth in any domain will surely but only come after struggling with it consistently over a long period. So don’t wait until you get a perfect understanding of the domain – avoid analysis-paralysis and get your hands dirty quickly.”
Tackling Early Data Science Challenges
Pushkar talks candidly about the kind challenges the early career data scientists faces. Pushkar puts it this way, “Largely, there are two kinds of challenges – understanding the business problem and data availability.” He says that data scientists have a tendency to fall in love with tools rather than focus on the job at hand. Pushkar warns, “Often we fall in love with a shiny new data science technique, and then the hammer starts looking for nails. This is a trap. I try to avoid these consciously and instead strive to take a “problem first” approach.”
Pushkar warns about the assumption a data scientist makes and invites them to question their assumption at every point in time. He says, “One needs to ascertain the assumptions one makes about what a given column in a table represents. Equally, one should be aware of the assumptions the engineer made while preparing that data table.”
Current Role And Project Focus
Currently, Pushkar is a senior data scientist in SIPL’s Big Data and Data Science team and completed a year in Star a few months ago. He thinks of his team like a startup embedded in a large corporate. He underlines his understanding of working with mixed teams of engineers, data scientists and business people when he says, “In our team there are data scientists, data engineers and developers on the technical side. Also, there are semi-technical folks who interface with the business units of Star. Together we try to understand and articulate business problems from the perspective of data science principles.”
In the current role, Pushkar splits time between being an individual contributor and a manager. He mentors young data scientists and also presents his work to Star’s senior management.
Most of his current focus understand the audience and their likings at STAR TV. He says, “We want to understand our audience in depth and infer their likes and dislikes from viewership data. Our thesis is that having a consumption-centric view of our users will help us identify content whitespaces that are novel and relevant. It will also give marketers a modern handle on assigning value to different audience segments.”
Personal Time For Fun And Ambitious Projects
Pushkar also dedicates a lot of time and energy on working on personal projects. Some are revealed to the world, and some are hidden. His mix of side projects is also varied from designing board games to building android apps for one reason or another. Pushkar has a natural flair for solving everyday problems using technology and other devices. He is also massively interested in 3D printing and an earlier life tried starting a 3D printing revolution in India.
On his more conventional side projects, he lists an interesting one. He says his interests are divided into engineering and data science. He describes the project in the following words, “I learnt to set up a data ingestion pipeline to consume PubMed citation index into Amazon S3, then make it query-able from Athena – my data-pond.” He plans to use Latent Dirichlet Allocation on this data set to identify the various topics that these articles may be about automatically.
He offers a piece of great advice for choosing side projects, “My side projects are often open-ended but generally tend to feed into and enrich my professional work projects, and I find these as a great way to stay up-to-date with the tech.”
Pushkar also emphasises that he says that he says great potential in probabilistic graphical models (PGM) apart from deep learning methods. “Ultimately I would like to master the art of composing machine learning systems from these two ingredients.”, Pushkar says when asked about his current learning goals.
Advice For Early Career Data Scientists
Pushkar strongly believes that data scientists should focus on design and testing of algorithms and understand the iterative nature of the process. In this process, the language the data scientist chooses is up to him/her, but there must be a certain level of proficiency.
Pushkar comments, “Fluency in any language which reduces this iteration time is desirable. I feel data science is a creative discipline and one will get a lot of ideas to tackle a particular problem. Choose the language in which you take the least amount of time to test an idea. Other factors can impact language choice are, readability, availability of libraries, quality of documentation, expertise in your team and community.”
He prefers Python because he feels he can express and validate DS ideas rapidly in Python. Pushkar highly recommends Jupyter Lab and Pycharm IDE for rapid prototyping of algorithms and writing production-grade code, respectively. Since Pushkar is a smart engineer, he prefers vi over emacs in a shell.
One important quality, Pushkar stresses, is the skill of asking questions. He says, “Asking interesting questions is an important skill but one that can be learnt over time. Make a bet (formulate a hypothesis) and then test it. Surround yourselves with people who are rich in data but would be glad to outsource its analysis to you.”
He also has interesting insight into how he learns. He starts by referring to textbooks, the classic ones by Russel and Norvig and the other by Ian Goodfellow. Though he is quick to say that social media also plays a huge role. Pushkar uses everything from Stackexchange to subreddits and even Youtube to his advantage. He says, “I use my commutes to and from work to catch up on the latest trends in ML and AI by tuning in to podcasts.”
The Future And Importance of Collaboration
Pushkar sees a lot of problems that are not addressed in various fields using machine learning. But he follows up this by saying that applying ML and AI techniques to many problems are tough. He again advocates a sane and balanced approach saying, “But despite all the hype and provocation about AI/ML in the popular media – a data science practitioner cannot help but be humbled by the actual difficulties of putting these algorithms into production. True real-world breakthroughs may be possible only by collaboration between machine learning experts and people with domain expertise and sustained interactive development efforts.”
He believes that a meaningful coming together of data scientists and semi-technical folks will lead to a great future.