How to build a data science portfolio in college?

A diverse, well-groomed portfolio can make all the difference in getting a job!

If you are entering higher ed and envision becoming a data scientist, you’re probably wondering where to start. To get the sexiest job of the 21st century, it’s pivotal to first understand the prerequisites – strong analytical and computational skills and start finding ways to hone in on them.

From an academic standpoint, data science is flourishing. More than 596 data science, big data & analytics courses are offered by nearly 470 colleges pan-India. Most of these institutes collaborate with SMEs to prepare a curriculum that exposes students to subjects like data analytics, machine learning, business analysis, statistics, data modelling, data visualisation, cloud computing, database systems, and many more; alongside programming languages such as Python, JavaScript, Scala, R, SQL, and Julia. And if you are lucky, then an internship might be included. But, that’s not enough!

According to Michael Page India’s ‘The Humans of Data Science’ report, data science is poised to create more than 11.5 million job openings by 2026. And if you want to stand out, you need to do more than the bare minimum. What you need is a data science portfolio.


Sign up for your weekly dose of what's up in emerging technology.

What is a data science portfolio?

Think of a data science portfolio as an extension to your resume. It is essentially public evidence of the projects you’ve worked on, showcasing your creative, technical, and soft skills, approach to effectively analysing data and drawing insights, and ability to communicate the outcome to audiences.

“A portfolio must essentially have projects that show your interest or expertise in different areas of data science. They should cover concepts like supervised learning, unsupervised learning, deep learning, etc. When you put these up in your resume or on platforms like GitHub, Deepnote, and Kaggle or your portfolio website, recruiters can see your capabilities first hand. A diverse, well-groomed portfolio can make all the difference in getting a job!” said Mohan C R, Project Engineer, Wipro.

Ways to build a data science portfolio 

From a theoretical perspective, massive open online courses (MOOCs) are perhaps a good place to start. It will help you get your fundamentals right while enabling you to learn the required skills at your own pace. Listed below are some of the most trendy courses in 2022:

Data Science Specialization by Johns Hopkins University

Available on: Coursera

Key Takeaways: 

  • Use R to clean, analyse, and visualise data
  • Navigate the entire data science pipeline from data acquisition to publication          
  • Use GitHub to manage data science projects 
  • Perform regression analysis, least squares and inference using regression models

Introduction to Data Science 

Available on: Metis

Key Takeaways:  

  • CS/Statistics/Linear Algebra
  • Exploratory Data Analysis and Visualisation
  • Data Modelling: Supervised/Unsupervised Learning and Model Evaluation, Feature Selection, Engineering, and Data Pipelines, Advanced Supervised/Unsupervised Learning, Advanced Model Evaluation and Data Pipelines 

Applied Data Science with Python Specialization by the University of Michigan

Available on: Coursera

Key Takeaways: 

  • Conduct an inferential statistical analysis
  • Discern whether a data visualisation is good or bad
  • Enhance a data analysis with applied machine learning
  • Analyse the connectivity of a social network

Data Science MicroMasters by UCSanDiego

Available on: edX

Key Takeaways:

  • Load and clean real-world data
  • Make reliable statistical inferences from noisy data
  • Use machine learning to learn models for data
  • Visualise complex data
  • Use Apache Spark to analyse data that does not fit within the memory of a single computer

Python for Data Science and Machine Learning Bootcamp

Available on: Udemy

Key Takeaways:

  • Python for Data Science and Machine Learning
  • Spark for Big Data Analysis
  • Implement Machine Learning Algorithms
  • Use NumPy for Numerical Data, Pandas for Data Analysis, Matplotlib for Python Plotting, Seaborn for statistical plots, Plotly for interactive dynamic visualisations, and scikit-learn for machine learning tasks
  • K-Means Clustering
  • Logistic Regression and Linear Regression
  • Random Forests and Decision Trees
  • Natural Language Processing and Spam Filters
  • Neural Networks

Apart from these courses, students can also participate in numerous hackathons that serve as a test-bed for aspiring data scientists. Each year, data science communities and platforms such as MachineHack, DataCrunch and DataHack, launch these events in collaboration with tech giants such as Genpact, IBM, etc., enabling tech enthusiasts to boost their skill sets and earn cash prizes/certificates while emphasizing the “fun” quotient. Kaggle is one of the best places to check out for such competitions. 

Last and most importantly, you need real projects to hone your data science skills.

What kind of projects should students look for?

Going back to his student days, Mohan sought to look for projects that helped him hone in on essential data science skills such as data cleaning, exploratory data analysis, data visualisation, and machine learning. Here’s why: 

Data Cleaning: Data Scientists spend nearly 80% of their time cleaning data to find something useful. So it’s good to work on challenging projects which have data spread over multiple files and have null values. This will put your skills to the test and help push you into experimenting with different data cleaning methods. 

Exploratory Data Analysis (EDA): Performing EDA will help you gain insights from your cleaned data. It will test your graphical interpretation skills and statistical knowledge when plotting your data, using mean plots, standard deviation plots or any other type of plots.

Data Visualisation: This process puts your storytelling abilities to the test! It will give you an idea of the many ways you can communicate and translate data using visual aids like graphs, charts, bars, or even images. There are many publicly available datasets that you can use to practice data visualisation and tell your story to the world.

Machine Learning: Having a good grasp of machine learning fundamentals will go a long way in your data science career. Projects revolving around ML help you solve real-world problems with creative solutions. For example, if you can learn how to build a predictive model, you’d be able to predict the likely outcome and automate the downstream process every time you have new data with an unknown outcome. However, never make the mistake of skipping the basics of ML and directly jumping to advanced/trending concepts.By the time Mohan got out of college, his portfolio included projects like Loan Prediction Problem Dataset, The Boston Housing Dataset, and Finding Donors for CharityML.

Here are some of the best projects you can work on as beginners: 

Apart from these must-dos, aspirants can start writing blogs, focus on networking by attending virtual events on data science, and constantly keep themselves updated on the latest developments in the world of data science. 

What do recruiters think?

Given the buzz around data science, it’s important to filter out the noise and get a clear understanding of what you need to be doing to get hired.

“There is an odd misconception that they can pursue this career without using mathematics. Nothing could be farther from the truth! When you are stuck with a result and need to understand what it’s trying to say, you need to understand how you arrived at that result and evaluate it accordingly. This requires mathematics! And many such misconceptions need to be cleared before making a move,” says Puneet Tripathi, Head of Data Science, “I’d recommend students to get themselves familiar with different programming languages and open-source platforms by taking on as many projects as possible. This is the best way to ensure a strong career in data science.”

More Great AIM Stories

Sri Krishna
Sri Krishna is a technology enthusiast with a professional background in journalism. He believes in writing on subjects that evoke a thought process towards a better world. When not writing, he indulges his passion for automobiles and poetry.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM