Active Hackathon

Understanding Dockers From Scratch: A Beginners Guide

Data scientists and researchers actively look for new and better algorithms to solve existing problems. All the algorithms are mathematically and computationally heavy and have a lot of code involved. Say for example, you’ve made an algorithm which predicts the stock price of Google with 95% accuracy.

Following the normal way, you could keep on running the code on your machine and get predictions. Having said that you are comfortable looking at that code and perfectly understand the output, but what if the output was supposed to be interpreted by someone who has never coded in their life? 


Sign up for your weekly dose of what's up in emerging technology.

This is when we should think about deployment to ease the consumption of the algorithm for the normal public. Deploying your model as a web app or as a plugin for some website would benefit the user without having to worry about the code. 

This article will give you a basic understanding of Docker, its components, and its use cases. 

  • Types of Deployment
  • Possible solutions
  • Drawbacks of those solutions
  • What is a container?
  • Containers vs VMs
  • What is Docker?
  • Other types of containers.
  • Some terminology to get you started
  • Conclusion

Types of Deployment:

Mainly there are four types of deployments:

  • Batch: This is usually used when we are not concerned about immediate response or streaming data. In this case the model could be retrained after 24 hours by showing it the data collected over time.
  • Near Real Time: This could be used when building recommender systems. If you watch “Money Heist” on Netflix today, you might get suggestions for movies and series of the same genre the next time you hop on Netflix.
  • Real Time: A very good example could be stock price prediction. The market being volatile the model must recalculate itself every time the price of a stock changes.
  • Edge: Edge deployment means deploying your model on an IoT device like phones, raspberry pi etc. Here we’ll have to quantize our machine learning model for quicker responses. It’s you who has to decide between better accuracy or response time when deploying models on the edge. 

The main thing required for the model deployment is the saved model file. There are several ways to save your model depending on the environment and tools you are working with. 

Possible solutions:

The easiest and quickest way is to use your saved model to make a web app using Flask, Django, Streamlit, etc. as per your requirement. Along with the model, you could also include a data transformation pipeline if you need one. Having done that, you’ll be able to run the web app on your local machine which means it is still not publicly accessible. To make it publicly available you’ll try and deploy it one some service and try to work around it. This might work for you once but it may not be the case every time. Let’s see why.

Drawbacks of the above methods:

The issue with traditional deployment is compatibility. Once you’ve made a web app using your model in Flask/Django on a Windows operating system and it works perfectly, you pass on the code to your DevOps team for deployment. In the process to deploy your web app for mass consumption, the DevOps team might be plunged into a pool of errors because of compatibility problems across different OS or hardware configuration. To overcome all such problems, we use containers.

What is a container? 

  • Just as the name suggests, it’s a container to store everything that you would need to run your web app. The container would consist of the dependencies, the OS requirements, saved models (with or without data transformation pipelines), datasets used (in case of clustering), and other essential files depending on your use case.
  • This container could be taken and deployed anywhere without running into compatibility issues as it helps you to practice isolation and standardizes the environment.
  • Containers are in existence for more than 15 years now, but they’ve gained attention recently. Prior to containers, we used to work on VMs to satisfy our needs. Let’s understand the pros and cons of using containers and VMs.

Container vs VMs:

ContainersVirtual Machines (VMs)
Makes it simple to run multiple instances of an app. Slightly tedious to run multiple instances of an app.
Using containers, you won’t have to worry about compatibility as long as you have mentioned all layers of the image correctly. You might wanna know the underlying OS before deploying your web app to make sure it’s compatible with the host OS. 
Containers have root access to the host OS which is not considered safe for production environments. With VMs you’ll have the highest level of security as there’s no interaction between the host OS and the base OS. 
Containers virtualise the OS.VMs virtualise the hardware. 
Containers utilise the user space.VMs utilise user space plus kernel space. 
Containers share the OS kernel.VMs have their own OS and apps. 
Several containers could run on a virtual machine.Virtual machines cannot run on containers. 
Docker Daemon could be used to manage several docker containers.Hypervisor is used to manage different virtual machines. 

What is Docker?

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries, and configuration files; they can communicate with each other through well-defined channels.

Other types of containers and container providers:

  • Java Containers: Jetty and Tomcat are examples of containers that help standalone java applications. This helps java apps to run without an external java environment.
  • Unikernels: They are special types of containers. Unlike dockers, they don’t need a host OS to run or external libraries.
  • LXD: It’s similar to Docker but the difference is, it does not need to rely on other applications for scaling. Like Docker needs Kubernetes to take care of scaling.
  • OpenVZ: This is one of the oldest containerised technologies used in production. OpenVZ requires the host and the guest OS to be Linux. It’s different to traditional virtual machines as it shares the kernel of the host OS.
  • Hyper-V Containers: This helps you to run containers inside virtual machines but better isolation and abstraction.
  • Some other containers (Rkt, Windows server containers, Wildfy, Springboot)

Some terminology to know before start working with Docker:

  • Docker Daemon: Docker daemon could be considered as a listener for the docker client. It manages all types of docker objects like images, containers, networks, and volumes. One docker daemon can communicate with other docker daemons as well.
  • Docker Client: The docker client is the way you interact with the docker. All the commands like docker run, docker build etc. are sent to the docker daemon for execution. A docker client can communicate with multiple docker daemons.
  • Docker Registry: Is used to store docker images.
  • Docker Images: It’s a read only template. Something like ISO images that we use to install windows or linux on your machines. It is a set of instructions to build a container. Often images are based on each other. For example you may build an image which is based on Windows, but it will also install some of the software that you want and also satisfy your custom dependencies to make sure that your application runs. You can make your own images or use an image published by someone in a registry (Docker Hub). If you plan to make your own image, then just create a Dockerfile containing the steps needed to create the image. Each instruction acts as a layer in the image and if you plan to change some instructions then while rebuilding, only the layers changed will be rebuilt. This makes containers so light and easy to run.
  • Container Host/Host OS: The Host OS is the operating system on which the Docker client and Docker Daemon run. In the case of Linux and non-Hyper-V containers, the Host OS shares its kernel with running containers. For Hyper-V each container has its own Hyper-V kernel as these would be containers running inside virtual machines. 
  • Container OS/Base OS: The base OS refers to an image that contains an operating system such as Ubuntu, CentOS, or windowsservercore. Typically, you would build your own image on top of a Base OS image so that you can utilize parts of the OS. Note that windows containers require a Base OS, while Linux containers do not.
  • OS Kernel: The kernel manages lower level functions such as memory management, file system, network, and process scheduling. 
  • Volumes: Volumes are storages attached to containers that reside on the host OS but are managed by the container itself. No host OS process can modify the volume allotted to a container. The volume for a container is usually created inside a directory and that directory is mounted to the container as a volume (storage). A given volume can be used by several other containers simultaneously and when no running container is using the volume, the volume is still available to Docker. 
  • Bind mount: Similar to volumes, but here you give a path to a directory on the Host OS and the Host OS processes could manage or change the directory used by the container. 
  • Tmpfs mount: This is used when you are using Linux to store data that you won’t need after the container stops working. The data is stored in the memory of the Host OS till the time container is running. 
  • Named pipes: These are used with windows. A named pipe mount can be used for communication between the host and a container. Common use case is to run a third-party tool inside a container and connect to the Docker Engine API using a named pipe.


Containerized applications are portable and lightweight, but it might not fit every use case. Not many companies prefer to run applications in the production environment as it’s recently hyped, but Dockers is catching up when clubbed with technologies like Kubernetes to help it perform better. 

More Great AIM Stories

Rithwik Chhugani
I am a final year Data Science student with good experience in working with startups across India and Australia in the Machine Learning and AI space. I am always in search of tasks that challenge me to broaden my vision and enhance the level of experience. Looking for a full-time position after my graduation in April 2021. Hit me up if you have an opportunity for me.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

A Case for IT Professionals Switching Jobs Frequently

For Indian companies, the ability to retain employees has become a tight ropewalk between transforming their working models and adopting a hybrid working model successfully. Over 60% respondents in the Qualtrics survey said that they would look for a new job, if forced to return to work from office full time.