In one of our previous articles, we learned how to set up Docker and use the latest Tensorflow 2.0 image to create a development environment for data science projects.
Docker is the most trusted and established tool for containerization and so it comes with a lot of pre-built images for all the major frameworks, software and tools for the industry. Docker Hub which is the Dockers repository for images contains official images for popular tools used by Data Scientists across the world and tensorflow:nightly-py3-jupyter image which we used last time is one among them.
In this article, we will take a conventional approach to set up docker containers from scratch. Although this approach is not as straightforward as downloading an image and running it, it gives us flexibility in terms of creating a custom environment for any project.
To follow along, you must have a basic understanding of Docker and must have it installed on your local machine.
What you will learn:
- Understanding The Docker File
- Creating a custom docker image From The Docker File
Understanding The Dockerfile
Before we begin let us go back a bit and revise some of the terms that we often use when dealing with docker.
Images: An image or Docker Image is a snapshot of a Linux operating system or environment which is very lightweight. Docker-Hub which is Docker’s official repository contains thousands of images which can be used to create containers.
Containers: Containers are the running instances of a docker image. We use an image to fire up multiple containers.
Dockerfile: A Dockerfile is a simple text file that defines the environment to be built. It is used to build a docker image and contains all the commands a user could call on the command line to assemble an image.
Breaking Down A Dockerfile
Let’s look at the simplest dockerfile and try to understand how it works.
Reading through the above file, you will notice that everything inside is nothing but shell scripts. If you are a Linux person writing a dockerfile is a piece of cake. Even otherwise, it’s not much of an effort. So let’s try to understand the above script.
The major portion of a dockerfile is occupied by Linux commands, there are some Docker Specific commands that specify things to the docker engine for creating an image. For example, the commands in uppercase letters such as FROM, RUN, LABEL etc are docker specific commands.
FROM: Initializes a new build stage and sets the Base Image for subsequent instructions. The specified image forms the base layer of the container.
RUN: Executes the command that follows within the environment and commits the changes.
LABEL: Adds metadata to an image. It is used as versioning the images.
Click here to check out all the available commands:
The dockerfile performs the following steps:
- Initializes a new image with a specified base image of Ubuntu:18.04 with FROM command
- Adds versioning to the image with LABEL Command.
- Sets the Environment Language parameter for the Linux environment.
- Creates an empty directory called Volume in the root folder of the new image/container.
- Install and updates packages for the new image.
- Runs the cleanup command to clean up packages.
- Installs Python 3.6 and PIP inside the new image.
Creating A Custom Docker Image Using The Dockerfile
So far we have just written instructions to build a docker image. The docker engine will use these instructions to build a docker image. Let’s do that.
To build a docker image from a docker file :
Create a docker file
vim dockerfile
Copy the contents of the given dockerfile and save it.
Building Image
With in the same directory where the dockerfile is residing, execute the following command to build an image.
docker build -t <repository/tag>
Eg. docker build -t datascience/python3.6:1.0 .
Note: Do not forget the dot/period at the end of the command.
The build command builds a docker image using the specified instructions in the dockerfile. The t flag allows us to specify a tag or name for the new image.
Output:
Successfully built 3b8897fc91c4
Successfully tagged datascience/python3.6:1
Once the process completes, enter the following command to list the images.
docker images
To fire up a container from the newly created image use the docker run command as follows:
docker run -it repository:tag
Eg. docker run -it datascience/python3.6:1.0
The -it flag runs the container in interactive mode. This allows us to enter directly into the container as it is fired up.
The above command runs the container and enters the root directory of the container.
We now have a separate Linux instance running with all the specific dependencies of the dockerfile.
Test if Python3 and pip are installed in the container.
Thus we can have our own custom environment depending on the project requirements. We can also use the newly created image as the base image for creating new images.