If you are entering higher ed and envision becoming a data scientist, you’re probably wondering where to start. To get the sexiest job of the 21st century, it’s pivotal to first understand the prerequisites – strong analytical and computational skills and start finding ways to hone in on them.
Sign up for your weekly dose of what's up in emerging technology.
According to Michael Page India’s ‘The Humans of Data Science’ report, data science is poised to create more than 11.5 million job openings by 2026. And if you want to stand out, you need to do more than the bare minimum. What you need is a data science portfolio.
What is a data science portfolio?
Think of a data science portfolio as an extension to your resume. It is essentially public evidence of the projects you’ve worked on, showcasing your creative, technical, and soft skills, approach to effectively analysing data and drawing insights, and ability to communicate the outcome to audiences.
“A portfolio must essentially have projects that show your interest or expertise in different areas of data science. They should cover concepts like supervised learning, unsupervised learning, deep learning, etc. When you put these up in your resume or on platforms like GitHub, Deepnote, and Kaggle or your portfolio website, recruiters can see your capabilities first hand. A diverse, well-groomed portfolio can make all the difference in getting a job!” said Mohan C R, Project Engineer, Wipro.
Ways to build a data science portfolio
From a theoretical perspective, massive open online courses (MOOCs) are perhaps a good place to start. It will help you get your fundamentals right while enabling you to learn the required skills at your own pace. Listed below are some of the most trendy courses in 2022:
Data Science Specialization by Johns Hopkins University
Available on: Coursera
- Use R to clean, analyse, and visualise data
- Navigate the entire data science pipeline from data acquisition to publication
- Use GitHub to manage data science projects
- Perform regression analysis, least squares and inference using regression models
Available on: Metis
- CS/Statistics/Linear Algebra
- Exploratory Data Analysis and Visualisation
- Data Modelling: Supervised/Unsupervised Learning and Model Evaluation, Feature Selection, Engineering, and Data Pipelines, Advanced Supervised/Unsupervised Learning, Advanced Model Evaluation and Data Pipelines
Applied Data Science with Python Specialization by the University of Michigan
Available on: Coursera
- Conduct an inferential statistical analysis
- Discern whether a data visualisation is good or bad
- Enhance a data analysis with applied machine learning
- Analyse the connectivity of a social network
Data Science MicroMasters by UCSanDiego
Available on: edX
- Load and clean real-world data
- Make reliable statistical inferences from noisy data
- Use machine learning to learn models for data
- Visualise complex data
- Use Apache Spark to analyse data that does not fit within the memory of a single computer
Available on: Udemy
- Python for Data Science and Machine Learning
- Spark for Big Data Analysis
- Implement Machine Learning Algorithms
- Use NumPy for Numerical Data, Pandas for Data Analysis, Matplotlib for Python Plotting, Seaborn for statistical plots, Plotly for interactive dynamic visualisations, and scikit-learn for machine learning tasks
- K-Means Clustering
- Logistic Regression and Linear Regression
- Random Forests and Decision Trees
- Natural Language Processing and Spam Filters
- Neural Networks
Apart from these courses, students can also participate in numerous hackathons that serve as a test-bed for aspiring data scientists. Each year, data science communities and platforms such as MachineHack, DataCrunch and DataHack, launch these events in collaboration with tech giants such as Genpact, IBM, etc., enabling tech enthusiasts to boost their skill sets and earn cash prizes/certificates while emphasizing the “fun” quotient. Kaggle is one of the best places to check out for such competitions.
Last and most importantly, you need real projects to hone your data science skills.
What kind of projects should students look for?
Going back to his student days, Mohan sought to look for projects that helped him hone in on essential data science skills such as data cleaning, exploratory data analysis, data visualisation, and machine learning. Here’s why:
Data Cleaning: Data Scientists spend nearly 80% of their time cleaning data to find something useful. So it’s good to work on challenging projects which have data spread over multiple files and have null values. This will put your skills to the test and help push you into experimenting with different data cleaning methods.
Exploratory Data Analysis (EDA): Performing EDA will help you gain insights from your cleaned data. It will test your graphical interpretation skills and statistical knowledge when plotting your data, using mean plots, standard deviation plots or any other type of plots.
Data Visualisation: This process puts your storytelling abilities to the test! It will give you an idea of the many ways you can communicate and translate data using visual aids like graphs, charts, bars, or even images. There are many publicly available datasets that you can use to practice data visualisation and tell your story to the world.
Machine Learning: Having a good grasp of machine learning fundamentals will go a long way in your data science career. Projects revolving around ML help you solve real-world problems with creative solutions. For example, if you can learn how to build a predictive model, you’d be able to predict the likely outcome and automate the downstream process every time you have new data with an unknown outcome. However, never make the mistake of skipping the basics of ML and directly jumping to advanced/trending concepts.By the time Mohan got out of college, his portfolio included projects like Loan Prediction Problem Dataset, The Boston Housing Dataset, and Finding Donors for CharityML.
Here are some of the best projects you can work on as beginners:
- Fake News Detection
- Climate Change Impacts on the Global Food Supply
- Human Action Recognition
- Forest Fire Prediction
- Road Lane Line Detection
Apart from these must-dos, aspirants can start writing blogs, focus on networking by attending virtual events on data science, and constantly keep themselves updated on the latest developments in the world of data science.
What do recruiters think?
Given the buzz around data science, it’s important to filter out the noise and get a clear understanding of what you need to be doing to get hired.
“There is an odd misconception that they can pursue this career without using mathematics. Nothing could be farther from the truth! When you are stuck with a result and need to understand what it’s trying to say, you need to understand how you arrived at that result and evaluate it accordingly. This requires mathematics! And many such misconceptions need to be cleared before making a move,” says Puneet Tripathi, Head of Data Science, Wakefit.co. “I’d recommend students to get themselves familiar with different programming languages and open-source platforms by taking on as many projects as possible. This is the best way to ensure a strong career in data science.”