MITB Banner

Introduction To Basic Concepts of R Programming Language

R, primarily an open-source programming language, provides an environment for performing statistical computing and graphics.

Share

R, primarily an open-source programming language, provides an environment for performing  statistical computing and graphics. It has a suite of software packages that can be used to accomplish a wide range of tasks such as data mining, time series analysis, machine learning, multivariate statistical analysis, analysis of spatial data, graphical plotting, etc.

Origin of R

R is an alternate implementation of the statistical programming language called S. S-PLUS was developed post S as its commercial version. R was introduced later by Ross Ihaka and Robert Gentleman in 1991. Though R is independent of S-PLUS, much of its code works without any alteration for R too. The first official version of R was released in 1995 as an open-source software package under the GNU General Public License.

Fundamental operations and concepts

Here, we explain in brief some basic yet essential concepts and functions an R programming beginner should know. Each of the further sub-topics has been demonstrated with a snippet of code implemented in R Console (RGui (32-bit)), which can be installed from here.

  • help.start(): opens R’s official documentation for general help on available functionalities.
  • help(“sum”) or ?sum : opens documentation for the sum() function

Note: If there is no function with the parameter name, a message will be displayed on the console informing the user that there is no documentation for it in the specified packages and libraries. E.g. help(“add”) gives the result:

  • help.search(“sum”) or ??sum : searches the help system to find instances of the string “sum”

Sample condensed output:

  • apropos(“sum”, mode=”function”) : lists all the available functions with “sum” string present in their name
  • data() : lists all the example datasets available in the currently loaded packages. (A new window named ‘R data sets’ gets opened in the console in which the output appears)

Sample condensed output:

Some general purpose functions

  • getwd(): to know the current working directory
  • setwd(PATH) : sets the specified path as the current working directory (changes done can be verified using getwd())
  • ls() : lists the objects in the current workspace. 
  • rm(objects): removes the object(s) specified as parameters from the current workspace

The following snippet creates objects x,y, z and then use ls() to display the objects’ names. On executing rm(x,y) removes objects x and y so again doing ls() gives only “z” as output.

  • history(num): opens a new window named ‘R History’, which contains names of ‘num’ number of last executed commands. If nothing is specified as an argument, last 25 commands are displayed by default. 

e.g. > history(5)

Sample output:

  • savehistory(“fname”) saves the workspace history in a ‘fname’ named file which can be loaded into the current workspace using loadhistory(“fname”) command.
  • save.image(“my_workspace”) saves the current workspace to a file named ‘my_workspace’ which can further be loaded using load(“my_workspace”) command.
  • q() : a dialog box will ask if you want to save the current workspace and then the R console will be exited.

Packages in R

A package in R is a collection of data, functions and compiled code in a properly defined format. Several packages are stored in a library.  

  • .libPaths() command shows the path location where the library is located
  • library() command displays the list of all the packages saved in the library.

Sample condensed output:

  • Package installation: install.packages() command displays a list of CRAN mirror websites for installing a package.

Sample condensed output:

  • update.packages() can be used to to get the changes/updates done to each package in the library
  • installed.packages() displays the list of all the installed packages along with some additional information such as version number, dependencies etc.
  • Particular package can be loaded in the current session using library(“package_name”) command. 

Objects in R

An object refers to anything that can be assigned to a variable. Each object has two attributes:

  1. length: number of elements in the object
  2. mode: denotes type of the object’s data (numeric, character, complex or logical)

Note: ‘numeric’ data type in R by default means decimal value and not an integer. E.g if we assign x=10 and then check is.integer(x), it will return FALSE. It can be converted to integer type using as.integer() as follows:

There are six types of R objects as follows:

  1. Vector: a 1D array which is a collection of fixed-sized cells having the same type of data.

Ways to create a vector:

  • vector1 <- 1:10          (has elements from 1 to 10)
  • Use ‘seq’ to create a vector of sequence

e.g. seq(from=1,to=10, by=2)  (choose elements from 1 to 10 in step of 2)

  • Use ‘rep’ to create vector having repeated element or another vector 

e.g. rep(“Hi”,4)         

  • Use c() method where ‘c’ stands for ‘combine’
  • e.g. vector1 <- c(1,2,3,4,5)

Element(s) of a vector can be accessed using indexing as follows:

  1. Matrix: It is a 2D vector with fixed-sized cells having the same type of data.

Matrix creation example:

Where, nrow and ncol denote the rows and columns respectively; byrow=TRUE means the matrix will be filled row-by-row.

Ways to access element(s) of a matrix:

  • M[n] : nth element of matrix M (counting occurs column-wise, with n=1 denoting the first element)
  • M[n,] : nth row of matrix M (n=1 denotes first row)
  • M[,n] : nth column of matrix M (n=1 denotes first column)
  • M[x,y] : element at xth row and yth column
  • M[,c(x,y)] : extract xth and yth columns at a time
  • M[c(x,y),] : extract xth and yth rows at a time
  1. Array : It is one or more dimensional array. So 1D array and 2D array are (almost) the same  as a vector and a matrix respectively. The one with 3 or more dimensions is said to be a multidimensional array.
  1. List : It is a collection of elements which can be of different data types. Also, the size of a list can be expanded on the fly.
  1. Factor : A factor in R is a data object which deals with categorical variables (i.e. those having some fixed possible values, e.g. ‘gender’ and ‘months’ variable). Each factor has a levels attribute that denotes the permitted values of the variable. The usefulness of a factor can be understood from the following short example.

e.g. Suppose, there is a list x1 having some of the months’ names as its elements. We create a factor with the data of x1 and initialize the ‘levels’ attribute with a list named ‘months’ which contains names of all the months in a year. If we simply sort x1, it will be sorted in alphabetical order, but if the factor y1 with well-defined levels is sorted, we get the x1’s elements sorted in the order in which those months occur in a year.

Now suppose there is a value in a list which does not match any of the ‘levels’ list, it will be converted to NA in the factor and the wrong element will be missing in the output if the factor is sorted.

If we miss defining the levels, explicitly, they will be taken as the data’s values sorted in alphabetical order.

Levels of a factor can be known using levels() method by passing the factor’s name as its argument.

  1. Data frame : A data frame in R refers to a data table in which the columns can be of different types but each particular column holds the same type of data.

Some inbuilt datasets such as the Iris Flower dataset can be loaded by loading the ‘datasets’ package and then loading the dataset using data.frame() as follows:

(Sample condensed output)

  We can also create a custom data frame as follows:

Number of rows and columns of a dataframe can be known using nrow() and ncol() methods respectively.

End Note

We have covered some fundamental R software packages that are required for an R programmer to know for leveraging R’s functionalities. However, there are numerous other details of the topics covered in this article. Also, there are several other concepts of the language that an R programmer needs to deal with. For an in-depth understanding of such topics, refer to the following sources:

Share
Picture of Nikita Shiledarbaxi

Nikita Shiledarbaxi

A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India