Now Reading
10 Useful R Packages Data Science & ML Enthusiasts Should Know

10 Useful R Packages Data Science & ML Enthusiasts Should Know

Ambika Choudhury

The popularity of R language has increased exponentially over the past few years and is widely applied in data science and machine learning. In this article, we list you top 10 R packages for data science and machine learning.

1| lattice

The lattice package, written by Deepayan Sarkar, attempts to improve on-base R graphics by providing better defaults and the ability to easily display multivariate relationships. In particular, the package supports the creation of trellis graph, the graphs which display a variable or the relationship between variables, conditioned on one or more other variables. A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data, this package is sufficient for typical graphics needs and is also flexible enough to handle most nonstandard requirements.

2| DataExplorer

Exploratory Data Analysis (EDA) is the initial and important phase of data analysis/predictive modeling. During this process, analysts/modelers will have a first look of the data, and thus generate relevant hypotheses and decide next steps. However, the EDA process could be a hassle at times. This R package aims to automate most of data handling and visualization, so that users could focus on studying the data and extracting insights.

The package can be installed directly from CRAN. To install type,


3| Dalex (Descriptive mAchine Learning EXplanations)

DALEX package contains various explainers that help to understand the link between input variables and model output. The single_variable() explainer extracts conditional response of a model as a function of a single selected variable. DALEX is an R library with tools which helps to understand the way complex models work.

To install from CRAN, type


4| dplyr

dplyr is a powerful R-package to transform and summarise tabular data with rows and columns. The package contains a set of functions (or “verbs”) that perform common data manipulation operations such as filtering for rows, selecting specific columns, re-ordering rows, adding new columns and summarising data. In addition, dplyr contains a useful function to perform another common task which is the “split-apply-combine” concept. dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges.

5| Esquisse

The purpose of this R package is to let you explore your data quickly to extract the information they hold. It allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar graphs, curves, scatter plots, histograms, then export the graph or retrieves the code generating the graph.

To install from CRAN, type

See Also


6| caret

The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for data splitting, pre-processing, feature selection, model tuning using resampling, variable importance estimation as well as other functionality. The package contains functions to streamline the model training process for complex regression and classification problems. The package utilises a number of R packages but tries not to load them all at package start-up (by removing formal package dependencies, the package startup time can be greatly decreased).

7| janitor

janitor has simple functions for examining and cleaning dirty data. It was built with beginning and intermediate R users in mind and is optimised for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff. The main janitor functions are perfectly format data.frame column names, create and format frequency tables of one, two, or three variables – think an improved table()and isolate partially-duplicate records.

8| rpart

The rpart code builds classification or regression models of a very general structure using a two-stage procedure; the resulting models can be represented as binary trees. The package implements many of the ideas found in the CART (Classification and Regression Trees) book and programs of Breiman, Friedman, Olshen, and Stone. Because CART is the trademarked name of a particular software implementation of these ideas and tree was used for the Splus routines of Clark and Pregibon, a different acronym – Recursive PARTitioning or rpart – was chosen.

9| prophet

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI.

10| plotly

Plotly is an R package for creating interactive web-based graphs via the open source JavaScript graphing library plotly.js. By default, Plotly for R runs locally in your web browser or in the R Studio viewer. The plot_ly() function provides a ‘direct’ interface to plotly.js with some additional abstractions to help reduce typing. There are two main ways to creating a plotly object: either by transforming a ggplot2 object (via ggplotly()) into a plotlyobject or by directly initializing a plotly object with plot_ly()/plot_geo()/plot_mapbox().

What Do You Think?

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top