How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Sign up for your weekly dose of what's up in emerging technology.
The data science sector is flourishing to such an extent that our earlier jobs study revealed that there are currently more than 97,000 job openings for analytics and data science in India right now.
It is true that the “hottest job of the 21st century” has all the buzz, glam and traffic, but many enthusiasts are still confused as to what this job entitles. Fewer still, understand, what it takes to be a data scientist.
In this article, Analytics India Magazine will give you a step-by-step guide to becoming a data scientist. It’s not an easy journey, but the results are worth it.
Note: This is a beginner’s guide
1. Learn Mathematics And Statistics
If you are a data science aspirant, you need strong background in mathematics and basic knowledge of statistics. In the midst of the hype around data-driven decision making, the basics are somehow getting sidelined. The boom in data science requires an increase in executive statistics and maths skills. Some of the fundamental concepts expected from a business analyst are correlation, causation and how to statistically test hypothesis.
We highly recommend starting with basics in linear algebra, then gradually moving to calculus. It may be hard to master them initially but given the time and practice with working, these areas will be familiar and comfortable to work on.
2. Practice Programming
Numerous studies, including our own, have come to the conclusion that Python is the most important language to be learnt by a data scientist. In fact, in 2019, this sentiment is being clamoured, as almost 75% of the industry, as well as the professionals, are saying that.
So beginners should focus on learning Python programming for at least their first next six months and interacting with databases. Then once you have a good understanding of Python and programming in general, you can then start learning other languages like R and Java, then move to machine learning packages like scikit-learn.
Sites like Kaggle and DataCamp are extremely good at testing codes and collaborating with peers and developers. In addition, forums like Stack Overflow are excellent to discuss problems and queries related to programming.
3. Dip Your Feet Into Machine Learning
Many people do the mistake of learning every algorithm in ML and forget where it actually helps in solving a problem. For beginners, it is suggested that they learn the popular and standard algorithms. A complicated algorithm is not always the solution for complex applications. It is all about how an ML problem is solved optimally.
Here are a few blogs which brilliantly explain the process:
- Machine Learning Mastery by Jason Brownlee – An amazing blog by expert Jason Brownlee. He explores the fascinating world of ML and captures its essence in the real world.
- Adam Geitey’s blog – interesting write-ups in ML and Python
- Arthur Juliani’s blog on Reinforcement Learning – an absolute gem of a blog which particularly focuses on reinforcement learning in ML.
- Edwin Chen’s blog – it explores requisite concepts of ML such as neural networks, deep learning etc. as well as the math behind it.
If you do decide to take a course in ML, keep in mind that most of these programmes only give a brief idea of the basic algorithms like Support Vector Machines (SVM) and neural networks. However, they strengthen concepts like matrix operations and linear regression and teach supervised and unsupervised learning. An introduction to some projects using programming languages, generally Matlab, R, Python or Octave also form a part. These include projects like ‘text recognition’, ‘spam classifier’, ‘movie recommender systems’.
4. Create And Build Machine Learning Projects
Learning is just the beginning, we need to implement. The knowledge one possesses can be appreciated only when it is represented. Taking up live projects, understanding the architecture behind the screen would help a lot. Hands-on experience in the field of data science is very much needed at the moment, large firms look for people who have experience and an analytical mindset.
You can learn via the following projects which are perfect for beginners:
- The Iris flower classification project
- MovieLens 100k
- Turkiye student evaluation dataset
- BigMart sales prediction
5. Create A Portfolio
While a resume is an important component to showcase your abilities to the potential employers, a data scientist should also be able to showcase his/her abilities in coding and other software capabilities. A crucial part of data science jobs is to be able to code, and GitHub serves as a perfect platform to access the coding skills and display hands-on ability to solve problems. Here are the points one must remember:
Good author information: Details such as candidate’s username, location, email address, current employer, etc must be included.
- Large followers: The number of followers that your portfolio has is a good indication of the work that you have done in the past. More than 50 is usually a decent number.
- Contribution graph: This showcases your keenness to explore other areas and shows activity levels in the coding community. The greener the graph is, the better is your contribution rate.
- Improving on stars: This is a way of identifying how you have engaged in the community. 100 stars are usually considered decent but larger the better.
- Forking and creating repositories: More the number of people who have forked, the greater is the popularity of that developer’s project in the GitHub community. A large amount of activity indicates that the developer is working on a popular project.
- Writing employer-targeted code: Writing a code related directly to the employer’s business is a good way to catch their attention. It can showcase your coding abilities while demonstrating the interest you have in getting that job.
6. Focus On Soft Skills Too
Industry experts say that simply hiring a data scientist is not enough. Managers need to take special care to align business and data teams thus enabling data scientists to be self-sufficient. Otherwise, they might not get the expected ROI in data science which is a problem almost 80% of the companies face. Here are the skills a good data scientist should focus on:
- Ability To Draw Parallels To Real-world Problems
- Business acumen
7. Apply For Jobs Wisely
As tools are evolving, data science job roles are maturing and becoming more mainstream in companies. The number of openings that companies have for data science roles is also on an all-time high. Given the number of opportunities available, these are being expanded to professionals with a non-technical background as well. While there are many positions with a shortage of ideal candidates, it has made it quite possible for one candidate landing up with more than one job offer in hand for similar roles.
That’s why it is necessary to ask questions about:
- Tools used
- Methodologies used
- Type of data used in the organisation
- Time spent on aspects of the role like analysis, data management
8. Keep Upskilling
To keep up with the changing times, most organisations try to hire candidates who have a definite willingness to learn and upskill. We have seen in the past how companies like Cognizant laid-off employees who were not able to keep up with the changing technologies and failed to upskill themselves.
Companies in 2019 are focussing on not just training a single skill but a cluster of skills which will be relevant for more number of years. Some of the skills that are currently picking up are:
- Artificial intelligence
- Connected devices
- Data analytics