Active Hackathon

10 Most Frequently Asked Questions In Data Science Interview

Data Science, Machine Learning, Artificial Intelligence are broad fields and one has to have the core concept in these fields. In this article, we jot down 10 most frequently asked questions in a data science interview.

1| What is regularisation? Explain L1 and L2 regularisation.

Regularisation is a mathematical way of solving the problem of over-fitting. It basically refers to the act of modifying a learning algorithm to favor “simpler” prediction rules to avoid overfitting. It helps to choose preferred model complexity, so that model is better at predicting.


Sign up for your weekly dose of what's up in emerging technology.

L1 regularisation is also coined as L1 norm or Lasso. Basically, in the L1 norm, the parameters are shrunk to zero. This regularisation does the feature selection by assigning insignificant input features with zero weight and useful features with non-zero weight.   

On the other hand, L2 regularisation or Ridge regularisation spreads the error among all the features. This regularization forces the weights to be small but does not make them zero and does non-sparse solution and is not robust to outliers as square terms blow up the error differences of the outliers and the regularization term tries to fix it by penalizing the weights. Ridge regression performs better when all the input features influence the output and all with weights are of roughly equal size.

Click here to learn more.

2| How Data Science differs from Big Data and Data Analytics?

Data Science is a field which contains various tools and algorithms for gaining useful insights from raw data. It involves various methods for data modelling and other data related tasks such as data cleansing, preprocessing, analysis, etc. Big Data implies the enormous amount of data which can be structured, unstructured and semi-structured generated through various channels and organisations. The tasks of Data Analytics involve providing operational insights into complex business situations. This also predicts the upcoming opportunities which the organisation can exploit.

Click here to learn more.

3| How do Data Scientists use statistics?

Statistics plays a powerful role in Data Science. It is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data. It serves a great impact on data acquisition, exploration, analysis, validation, etc.

Click here to learn more.

4| Why data cleansing is important?

Data cleansing is a process in which you go through all of the data within a database and either remove or update information that is incomplete, incorrect, improperly formatted, duplicated, or irrelevant. It usually involves cleaning up data compiled in one area. For individuals, data cleansing is important because it ensures Data cleansing usually involves cleaning up data compiled in one area. In the case of an organisation, data cleansing is important because it improves your data quality and in doing so, increases overall productivity.

Click here to read more.

5| What is Linear and Logistic Regression?

The linear regression method involves continuous dependent variable and contains only one independent variable in case of Simple Linear Regression and multiple independent variables in case of Multiple Linear Regression. Here, the outcome (dependent variable) is continuous and can have any one of an infinite number of possible values. Linear regression gives an equation which is of the form Y = mX + C, means equation with degree 1 and this method is used when your response variable is continuous. For instance, weight, height, number of hours, etc.

While in logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, etc. This method gives an equation which is of the form Y = eX+ e-X.

Click here to learn more.

6| What is Normal Distribution?

The Normal Distribution is a very common distribution and in the statistical term, it is known as Gaussian distribution. A normal distribution has the following characteristics such as the mean, median and mode of the distribution coincide, the curve of the distribution is bell-shaped and symmetrical about the line x=μ, the total area under the curve is 1 and exactly half of the values are to the left of the center and the other half to the right.

7| Difference between Interpolation and Extrapolation

Extrapolation and interpolation are both used to estimate hypothetical values for a variable based on other observations. Interpolation is an estimation of a value within two known values in a sequence of values and Extrapolation is an estimation of a value based on extending a known sequence of values or facts beyond the area that is certainly known.

Click here to learn more.

8| What is a recommender system?

Recommender systems are one of the most widely spread applications of machine learning technologies in organisations. This system helps a user to interact with many items. The machine learning algorithms in recommender systems are typically classified into two categories —content-based and collaborative filtering methods although modern recommenders combine both approaches. Content-based methods are based on the similarity of item attributes and collaborative methods calculate similarity from interactions.

Click here to learn more.

9| Between R and Python, Which one would you choose for text analysis?

Between R and Python, Python would be the best choice as it has Pandas library which provides high-performance data analysis tools and easy to use data structures. However, you can go with either of these languages depending on the complexity of the data which is being analysed.

10| Explain A/B Testing

A/B testing is a statistical method of comparing two or more versions in order to determine which version works better and also understands if the difference between the two or more versions is statistically significant. It is a powerful tool for product development. In technical terms, A/B test is used to refer to any number of experiments where random assignment is used to tease out a causal relationship between treatment, typically some change to a website, and an outcome, often a metric that the business is interested in changing.

Click here to know more.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.