Now Reading
Top Commands In Shell Scripting Every Data Scientist Must Know

Top Commands In Shell Scripting Every Data Scientist Must Know


While programming languages like Python, R and Swift have been around for a few years now, Shell Scripting has been in existence for more than two decades. Bash Shell is the most popular one amongst types of shell scripting. However, Shell Scripting is little known to data scientists, and the command-line tools for various analytics as well as data science tasks are even further shrouded in mystery. With the help of Shell Scripting, data scientists can build data pipelines. To do this, they need to use several command-line tools which are also known as filters. 

Some of the important command-line tools for data analysts/scientists are mentioned below:



(The list is in alphabetical order.)

1| awk

Awk is one of the most popular command-line tools for building shell scripts. This command searches for text-based files or data and is basically used for generating information or manipulating data. It also allows users to implement numeric functions, string functions, logical operators, etc. It is useful for the transformation of data files along with creating formatted reports.

Syntax: awk options ‘selection _criteria {action }’ input-file > output-file 

2| cat

The cat command is used for concatenating files and printing on the standard output. It allows the user to create as well as concatenate files after reading the given file. This command also allows us to create more than one file and view as well as concatenate them. The basic functions of this command are to create text files, viewing them as well as combining them.

Syntax: $cat filename (This command will show the content of the file) 

3| cut

This command is basically for cutting out the sections from each line of files and drafting the outcome to standard output. It can be applied to not only cut data from file formats like CSV but also cut portions of a given line by byte position, character, and delimiter. 

Syntax: cut OPTION… [FILE]…

4| find

The find command-line tool can be used to find a file or a directory in order to perform some specific functions on them. It supports searching by file, folder, name, creation date, modification date, owner and permissions.

Syntax: $ find [where to start searching from] [expression determines what to find] [-options] [what to find]

5| Gnuplot

Gnuplot is a portable command-line driven graphing utility which is used for visualisation of various mathematical functions, data as well as web scripting. It can be used interactively to plot functions and data points in both two- and three-dimensional plots in many different styles and many different output formats. It can also be used as a scripting language to automate the generation of plots.

6| grep

Global Regular Expression Print or grep is a command-line tool which is basically used to search for a string of characters in a specified file. The grep filter searches a file for a particular pattern of characters, and displays all lines that contain that pattern. 

Syntax: grep [options] pattern [files]

7| head

The head command print the top N number of data of the given input. It usually prints the first 10 lines of the specified files by default. This means that the command-line tool is used for viewing the output of the first part of the given input.

See Also

Syntax: head [options] [file(s)]

8| history

The history command is used to view the commands which have been previously executed. When someone starts a session, it shows a list of the commands that have been entered. There is a default file known as Bash History where the shell history command shows the whole list of the command. It will also search for the last command that matches the pattern provided and run it.

Syntax: $ history

9| shuf

Shuffle or shuf is a command-line tool for generating random permutations in Unix-like operating systems. This command can be useful when working on arrays. It usually shuffles the lines in a given file and writes the result to standard output. 

Syntax: shuf [OPTION]… [FILE]

10| sed

Sed command-line tool is basically a stream editor which performs a number of functions like insertion, deletion, replace, find, etc. It also helps in editing files without open the file. It supports regular expression which allows it to perform complex pattern matching.

Syntax: sed OPTIONS… [SCRIPT] [INPUTFILE…] 


Enjoyed this story? Join our Telegram group. And be part of an engaging community.


Provide your comments below

comments

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
Scroll To Top