15 Most Popular R Libraries You Need To Know in 2022

With an ever-expanding list of supported libraries, R today is stronger than ever.
Listen to this story

While many people opt for Python for machine learning tasks today, R remains a staple in any developer’s toolkit. With its clean code, ability to chain functions, and the pipe operator, R can often make simple tasks super easy to do. It also stands its ground well in complex tasks such as forecasting or modelling. 

Overall, R today is stronger than ever, with an ever-expanding list of supported libraries.

Here are the 15 R libraries for machine learning released in 2022!

fastTopics

The package implements algorithms for data count of fitting topic models and non-negative matrix factorization. The methods exploit the relationship between the probabilistic latent semantic index and Poisson non-negative matrix factorization. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

fastTopics provides tools to compare, annotate and visualise models. It creates ‘structure plots’ and identifies key features.

Check the documentation here.

Metrica

The package compiles over 80 functions and is designed to evaluate the prediction performance of regression and classification point-forecast models such as DNDC, APSIM, DSSAT, and more.

Metrica offers a toolbox with a wide spectrum of error metrics, indices, and coefficients for different features between predicted and observed values, along with some basic visualisation functions to assess models’ performance provided in customisable format (ggplot).

Check the documentation here

SparseVFC (Sparse Vector Field Consensus for Vector Field Learning)

The SparseVFC package implements the sparse vector field consensus (SparseVFC) algorithm for robust vector field learning. It is largely translated from the MATLAB functions in https://github.com/jiayi-ma/VFC.

Check the documentation here.

agua

Based on h2oparsnip package, agua enables users to fit, optimise, and evaluate models via H2O using tidymodels syntax. However, most users will have to use the features via the new parsnip computational engine ‘h2o’.

​​Whil fitting the model, the data is passed to the h2o server directly. The data is passed once for tuning, and instructions are given to h2o.grid() to process them.

Check the documentation here.

OpenAI

OpenAI is an R wrapper of OpenAI API endpoints. This package covers Engines, Completions, Edits, Files, Fine-tunes, Embeddings and legacy Searches, Classifications, and Answers endpoints.

To use the OpenAI API, you need to provide an API key. To begin, sign up for OpenAI API on this page. Once you sign up and log in, you need to open this page, click on ‘Personal’, and select ‘View API keys’ in the drop-down menu. You can then copy the key by clicking on the green text ‘Copy’.

Check the documentation here

webmorphR

With a focus on face stimuli, webmorphR aims to make the construction of image stimuli more consistent.

The stimuli used in research cannot be shared for ethical reasons but webmorphR allows sharing of recipes for creating stimuli, encouraging generalisability to new faces.

Check the documentation here

cito

‘cito’ aims to help you build and train Neural Networks with the standard R syntax. It allows the whole model creation process and training with one line of code. Furthermore, all generic R methods can be used on the created object. 

cito is based on the ‘torch’ framework available for R. Since it is native to R, no Python installation is needed for this package.

Check the documentation here.

etree

The goal of etree is to provide a friendly implementation of Energy Trees, a model for classification and regression with structured and mixed-type data. The package currently covers functions and graphs as structured covariates.

Check the documentation here

mildsvm

The package provides a simple way to learn from data by training Support Vector Machine (SVM)-based classifiers. Furthermore, it contains useful functions for building and printing multiple instance data frames.

Check the documentation here

aorsf

Decision trees are developed by splitting training data into two new subsets to have more similarity within the new subsets than between them. The splitting process is repeated on the resulting subsets of data until a stopping criterion is met.

Check the documentation here

calibrationband

An R package to assess the calibration of binary outcome predictions. Authored by Timo Dimitriadis (Heidelberg University), Alexander Henzi (University of Bern), and Marius Puke (University of Hohenheim).

An honest calibration assessment for binary outcome predictions provides functions to assess the calibration of probabilistic classifiers using confidence bands for monotonic functions. It also facilitates constructing inverted goodness-of-fit tests, whose rejection allows for a sought-after conclusion of a sufficiently well-calibrated model.

Check the documentation here

tidytags

The purpose of tidytags is to make the collection of Twitter data more accessible and robust. tidytags retrieves tweet data collected by a Twitter Archiving Google Sheet (TAGS), gets additional metadata from Twitter via the rtweet R package, and provides additional functions to facilitate systematic yet flexible analyses of data from Twitter. TAGS is based on Google spreadsheets. A TAGS tracker continuously collects tweets from Twitter based on predefined search criteria and collection frequency.

Check the documentation here

Mlim

Currently implemented as an R package, the software brings machine learning to provide a versatile missing data solution for various data types—continuous, binary, multinomial, and ordinal. In a nutshell, mlim is expected to outperform any other available missing data imputation software on many grounds.

The high performance of mlim is mainly through fine-tuning an ELNET algorithm, which often outperforms any standard statistical procedure or untuned machine learning algorithm and also generalises very well.

Check the documentation here.

Kernelshap

The ‘kernelshap’ package implements a multidimensional refinement of the Kernel SHAP Algorithm described in Covert and Lee (2021). The package allows the calculation of Kernel SHAP values exactly through iterative sampling (as in Covert and Lee, 2021) or through a hybrid of the two. As soon as sampling is involved, the algorithm iterates until convergence and standard errors are provided.

Check the documentation here.

Survex

Based on DALEX, this package provides model-agnostic explanations for survival models. Users unfamiliar with explainable machine learning can refer to Explanatory Model Analysis, which has most of the methods included in survex extend these described in EMA and implemented in DALEX but to models with functional output.

Check the documentation here.

More Great AIM Stories

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM