Build Computer Vision Applications with Few Lines of Code using MONK AI

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification.

Tessellate Imaging is an Indian based AI startup company helping businesses grow scalably with the power of Machine Learning, Computer Vision, Image Processing and Analysis, Deep Learning, and ML/DL DevOps. Key members – Adhesh Shrivastava(CEO), Akash Deep Singh(COO), Abhishek Kumar Annamraju(CTO). They have worked with all kinds of industries, from healthcare to retail companies such as AlemHealth, NetLink, GEO Graph, blooskai, Packt and many others. Apart from this, they have released some amazing open-source libraries on GitHub. Let’s have a look at these.

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification. 

GitHub Repository –

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Documentation –

Demonstrations –


  • It is best suited for beginners (with a complete roadmap). Even developers and researchers can use it to build an end-to-end model for a specified use case quickly.
  • Auto hyperparameter tuning with its hyperparameter analyzer.
  • It can be integrated with any deep learning framework such as backend – Tensorflow, PyTorch, MXNet, etc. with a wide range of transfer learning models.
  • It has separate modules that can be used in Google Colab ( pip install -U monk-colab ), Kaggle notebooks ( pip install -U monk-kaggle )and other data science competition platforms.
  • Allows project multiple project management and prototyping.
  • Quick data loading, training, validation and deployment.

CPU (Non GPU) : pip install -U monk-cpu

All backend:              pip install -U monk-cpu

Gluon backend: pip install -U monk-gluon-cpu

Pytorch backend: pip install -U monk-pytorch-cpu

Keras backend: pip install -U monk-keras-cpu

For a complete list visit: repository


Clone the repository: git clone


Create an image classifier

 #Create an experiment
 ptf.Prototype("project-1", "experiment-1")
 #Load Data
 # Train


predictions = ptf.Infer(img_name="image.png", return_raw=True);

Compare Experiments

 #Create comparison project
 #Add all your experiments
 ctf.Add_Experiment("project-1", "experiment-1");
 ctf.Add_Experiment("project-2", "experiment-2");
 # Generate statistics

Weather classification Demo:

Explore the different experiments from here.

Monk Object Detection library contains object detection using transfer learning, image segmentation and localization, activity recognition, pose estimation, OCR and some other use cases.


  • Wide range of SOTA algorithms
  • Easy installation and usage
  • Build object detection pipelines
  • Support for custom annotation formats – COCO, YOLO, PASCAL VOC, etc
  • Easy deployment

Pothole detection: Source Code

To use the pretrained model use this 10 lines of code:

import os
import sys
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/") sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")

from infer_base import *
class_file = set_class_list("pothole_trained/classes.txt"); 

set_model_params(model_name="vgg16", model_path="pothole_trained/model_vgg16-0050.params"); 

set_hyper_params(gpus="0", batch_size=1); 
set_img_preproc_params(img_short_side=600, img_long_side=1000,                         mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 


sym = set_network(); mod = load_model(sym); 
set_output_params(vis_thresh=0.9, vis=True) output = Infer("pothole_trained/test/img1.jpg", mod); 

pothole  0.9950724244117737 [109.65555071009518, 303.28242910581656, 441.5280866987249, 492.36453810607287] pothole  0.9883765578269958 [503.1345823081632, 246.1549497164476, 658.0930820964118, 353.84971546768463] 

To train custom detector, dataset and annotation files need to be downloaded (Provided in the repository). 

Training using mxrcnn:

 import os
 import sys
 from train_base import * 


 root_dir = "./";
 coco_dir = "potholes";
 img_dir = "images";
 set_dataset_params(root_dir=root_dir, coco_dir=coco_dir, imageset=img_dir); 

#model type



set_hyper_params(gpus="0", lr=0.001, lr_decay_epoch="1", epochs=2, batch_size=1);
 set_output_params(log_interval=100, save_prefix="model_vgg16"); 


 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                      mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 

#initializing parameters


# Invoke Dataloader

roidb = set_dataset();

 INFO:root:computing cache ./cache/coco_images_roidb.pkl
 INFO:root:saving cache ./cache/coco_images_roidb.pkl
 INFO:root:coco_images num_images 665
 INFO:root:filter roidb: 665 -> 665
 INFO:root:coco_images append flipped images to roidb
 loading annotations into memory...
 Done (t=0.01s)
 creating index...
 index created! 


sym = set_network();

# Train

train(sym, roidb);

 INFO:root:max input shape
 {'bbox_target': (1, 36, 62, 62),
  'bbox_weight': (1, 36, 62, 62),
  'data': (1, 3, 1000, 1000),
  'gt_boxes': (1, 100, 5),
  'im_info': (1, 3),
  'label': (1, 1, 558, 62)}
 INFO:root:max output shape
 {'bbox_loss_reshape_output': (1, 128, 8),
  'blockgrad0_output': (1, 128),
  'cls_prob_reshape_output': (1, 128, 2),
  'rpn_bbox_loss_output': (1, 36, 62, 62),
  'rpn_cls_prob_output': (1, 2, 558, 62)}
 INFO:root:locking params
 ['conv1_1_weight',  'conv1_1_bias',  'conv1_2_weight',  'conv1_2_bias',  'conv2_1_weight',  'conv2_1_bias',  'conv2_2_weight',  'conv2_2_bias',  'conv3_1_weight',  'conv3_1_bias',  'conv3_2_weight',  'conv3_2_bias',  'conv3_3_weight',  'conv3_3_bias',  'conv4_1_weight',
  'conv4_1_bias',  'conv4_2_weight',  'conv4_2_bias',  'conv4_3_weight',  'conv4_3_bias']
 INFO:root:lr 0.001000 lr_epoch_diff [1] lr_iters [1330]
 INFO:root:Epoch[0] Batch [0-100] Speed: 4.71 samples/sec RPNAcc=0.926091 RPNLogLoss=0.251855 RPNL1Loss=0.887560 RCNNAcc=0.874691 RCNNLogLoss=0.319362 RCNNL1Loss=2.404060
 INFO:root:Epoch[0] Batch [0-200] Speed: 4.23 samples/sec RPNAcc=0.934255 RPNLogLoss=0.218708 RPNL1Loss=0.808457 RCNNAcc=0.869170 RCNNLogLoss=0.320706 RCNNL1Loss=2.343518
 INFO:root:Epoch[0] Batch [0-300] Speed: 4.43 samples/sec RPNAcc=0.936786 RPNLogLoss=0.207275 RPNL1Loss=0.819963 RCNNAcc=0.876739 RCNNLogLoss=0.296186 RCNNL1Loss=2.326637
 INFO:root:Epoch[0] Batch [0-400] Speed: 4.51 samples/sec RPNAcc=0.938338 RPNLogLoss=0.195882 RPNL1Loss=0.786549 RCNNAcc=0.879988 RCNNLogLoss=0.286368 RCNNL1Loss=2.306008
 INFO:root:Epoch[0] Batch [0-500] Speed: 4.41 samples/sec RPNAcc=0.940432 RPNLogLoss=0.184636 RPNL1Loss=0.763770 RCNNAcc=0.882797 RCNNLogLoss=0.277150 RCNNL1Loss=2.277596 


 from infer_base import *
 class_file = set_class_list("./potholes/annotations/classes.txt"); 

#Model – Select the model as per number of iterations it has been trained for

set_model_params(model_name="vgg16", model_path="trained_model/model_vgg16-0002.params");

#Hyper Params

set_hyper_params(gpus="0", batch_size=1);

# Preprocessing

 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                        mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 




 sym = set_network();
 mod = load_model(sym); 

#Load Image and infer

 set_output_params(vis_thresh=0.9, vis=True)
 output = Infer("potholes/test/img1.jpg", mod); 
pothole  0.9788532853126526 [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625] 
pothole 0.8982967138290405 [532.2888671875, 268.3476318359375, 610.94091796875, 316.1823974609375] 
pothole  0.8982408046722412 [166.78828125, 163.675048828125, 226.873828125, 200.3756591796875] pothole  0.8538920283317566 [451.009326171875, 240.5260986328125, 521.778076171875, 298.5822998046875] 
['pothole\n', 0.9788532853126526, [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625]] 

Check out the other implemented applications: link

Monk GUI is a no-code GUI platform built on top of Monk and Monk object detection libraries to provide an interface for computer vision problems. It is built using the PyQt5 library.

Weed Classification Demo:

End Notes

Monk is an amazing library to build computer vision solutions. With just a few lines, one can develop models. The repository provided contains datasets and annotations(for object detection) and pre-trained models to get our work done quickly. In the upcoming releases, we can expect to get custom training interfaces and integration with more frameworks and models. 

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox