Data analysing, irrespective of its form, can be extremely chaotic and challenging. This is where feature engineering steps in. A method to ease data analysis, feature engineering simplifies data reading for machine learning models.
A feature or variable is nothing but the numerical representation of all kinds of data– structured and unstructured. Feature engineering is a vital part of the process of predictive modelling. It involves the usage of mathematical functions. It improves the performance of machine learning algorithms.
Today, we list six resources to learn feature engineering from:
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Courses
Feature Engineering for Machine Learning
Offered by Udemy, this course teaches multiple techniques for missing data imputation, dealing with infrequent, rare and unseen categories; and extract meaningful features from dates and time variables. After taking the course, one will learn how to engineer features and build more powerful machine learning models.
Taught by Lead Data Scientist and Founder of Train in Data, this course is spread over 14 sections and 134 lectures, the course takes 10 hours and 27 minutes to complete.
Download our Mobile App
The prerequisites of this course includes Python and Jupyter notebook installation, familiarity with machine learning algorithms and Scikit-Learn and experience with Numpy and Pandas.
For more information, click here.
Feature Engineering for Machine Learning in Python
Offered by Datacamp, this course helps learners create new features to improve the performance of their machine learning models. The course covers:
- Creating features: One will explore what feature engineering is and how to get started with applying it to real-world data. Learners will learn loading, exploring and visualising survey response dataset. Additionally, they will learn using the pandas package to create new features from both categorical and continuous columns.
- Dealing with messy data
- Conforming to statistical assumptions
- Dealing with text data
The course is delivered by Robert O’Callaghan, Director of Data Science at Ordergroove and consists of 16 videos, 53 exercises and takes around four hours to complete. On the completion of this course, a learner will have hands-on experience on how to prepare any data for their own machine learning models.
For more information, click here.
Feature Engineering for improving learning environments
Available on edX, this feature engineering course is offered by University of Texas, Arlington. It focuses on paying attention to the ways in which researchers and data scientists can transform raw data into features used in various machine learning algorithms. It provides knowledge from practice and logic to create features, strategies to use prior research, helps in the building and evaluation of machine learning models.
The prerequisites for taking this course includes some knowledge in programming and statistics. After completing this course, learners would know how to transform and visualise data using R, applying selected machine learning algorithms to regression and classification tasks in R, and about strategies for applying data-intensive research workflows for feature engineering and model building. The course approximately takes three weeks to complete.
For more information, click here.
Books
Feature Engineering for Machine Learning
In this book, authors Alice Zheng and Amanda Casari focus on teaching techniques for extracting and transforming features– numeric representations of raw data into formats for machine learning models. Every chapter guides the reader through a single data problem. Rather than simply teaching the principles of feature engineering, the authors focus on practical application with exercises throughout the book.
For more information, click here.
Feature Engineering and Selection: A Practical Approach for Predictive Models
Written by Max Kuhn– Software Engineer at RStudio, and Kjell Johnson– owner and founder of Stat Tenacity, this book describes techniques for finding the best representations of predictors for modelling.
The book covers the following topics:
- Imputing missing data
- Categorical encoding
- Numerical feature transformation
For more information, click here.
Python Feature Engineering Cookbook
This book by Soledad Galli provides over 70 recipes for creating, engineering, and transforming features to build machine learning models. The book covers the following features:
- Simplifying feature engineering pipelines with powerful Python packages
- Gripping with imputing missing values
- Encoding categorical variables with a wide set of techniques
- Extracting insights from text quickly and effortlessly
- Deriving new features by combining existing variables
- Creating informative variables from data and time
For more information, click here.