MITB Banner

How To Build A Docker Playground For Data Science

Share

In one of our previous articles, we learned how to set up Docker and use the latest Tensorflow 2.0 image to create a development environment for data science projects.

Docker is the most trusted and established tool for containerization and so it comes with a lot of pre-built images for all the major frameworks, software and tools for the industry. Docker Hub which is the Dockers repository for images contains official images for popular tools used by Data Scientists across the world and tensorflow:nightly-py3-jupyter image which we used last time is one among them.

In this article, we will take a conventional approach to set up docker containers from scratch. Although this approach is not as straightforward as downloading an image and running it, it gives us flexibility in terms of creating a custom environment for any project. 

To follow along, you must have a basic understanding of Docker and must have it installed on your local machine.

What you will learn:

  • Understanding The Docker File
  • Creating a custom docker image From The Docker File

Understanding The Dockerfile

Before we begin let us go back a bit and revise some of the terms that we often use when dealing with docker.

Images: An image or Docker Image is a snapshot of a Linux operating system or environment which is very lightweight. Docker-Hub which is Docker’s official repository contains thousands of images which can be used to create containers.

Containers: Containers are the running instances of a docker image. We use an image to fire up multiple containers.

Dockerfile: A Dockerfile is a simple text file that defines the environment to be built. It is used to build a docker image and contains all the commands a user could call on the command line to assemble an image.

Breaking Down A Dockerfile

Let’s look at the simplest dockerfile and try to understand how it works.

Reading through the above file, you will notice that everything inside is nothing but shell scripts. If you are a Linux person writing a dockerfile is a piece of cake. Even otherwise, it’s not much of an effort. So let’s try to understand the above script.

The major portion of a dockerfile is occupied by Linux commands, there are some Docker Specific commands that specify things to the docker engine for creating an image. For example, the commands in uppercase letters such as FROM, RUN, LABEL etc are docker specific commands.

FROM: Initializes a new build stage and sets the Base Image for subsequent instructions. The specified image forms the base layer of the container.

RUN: Executes the command that follows within the environment and commits the changes.

LABEL: Adds metadata to an image. It is used as versioning the images.

Click here to check out all the available commands:

The dockerfile performs the following steps:

  • Initializes a new image with a specified base image of Ubuntu:18.04 with FROM command
  • Adds versioning to the image with LABEL Command.
  • Sets the Environment Language parameter for the Linux environment.
  • Creates an empty directory called Volume in the root folder of the new image/container.
  • Install and updates packages for the new image.
  • Runs the cleanup command to clean up packages.
  • Installs Python 3.6 and PIP inside the new image.

Creating A Custom Docker Image Using The Dockerfile

So far we have just written instructions to build a docker image. The docker engine will use these instructions to build a docker image. Let’s do that.

To build a docker image from a docker file :

Create a docker file

vim dockerfile

Copy the contents of the given dockerfile and save it. 

Building Image 

With in the same directory where the dockerfile is residing, execute the following command to build an image.

docker build -t <repository/tag>

Eg. docker build -t datascience/python3.6:1.0 .

Note: Do not forget the dot/period at the end of the command.

The build command builds a docker image using the specified instructions in the dockerfile. The t flag allows us to specify a tag or name for the new image.

Output:

Successfully built 3b8897fc91c4 Successfully tagged datascience/python3.6:1

Once the process completes, enter the following command to list the images.

docker images

To fire up a container from the newly created image use the docker run command as follows:

docker run -it repository:tag

Eg. docker run -it datascience/python3.6:1.0

The -it flag runs the container in interactive mode. This allows us to enter directly into the container as it is fired up. 

The above command runs the container and enters the root directory of the container.

We now have a separate Linux instance running with all the specific dependencies of the dockerfile.

Test if Python3 and pip are installed in the container.

Thus we can have our own custom environment depending on the project requirements. We can also use the newly created image as the base image for creating new images. 

Share
Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.