Active Hackathon

Build Computer Vision Applications with Few Lines of Code using MONK AI

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification.

Tessellate Imaging is an Indian based AI startup company helping businesses grow scalably with the power of Machine Learning, Computer Vision, Image Processing and Analysis, Deep Learning, and ML/DL DevOps. Key members – Adhesh Shrivastava(CEO), Akash Deep Singh(COO), Abhishek Kumar Annamraju(CTO). They have worked with all kinds of industries, from healthcare to retail companies such as AlemHealth, NetLink, GEO Graph, blooskai, Packt and many others. Apart from this, they have released some amazing open-source libraries on GitHub. Let’s have a look at these.

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification. 


Sign up for your weekly dose of what's up in emerging technology.

GitHub Repository –

Documentation –

Demonstrations –


  • It is best suited for beginners (with a complete roadmap). Even developers and researchers can use it to build an end-to-end model for a specified use case quickly.
  • Auto hyperparameter tuning with its hyperparameter analyzer.
  • It can be integrated with any deep learning framework such as backend – Tensorflow, PyTorch, MXNet, etc. with a wide range of transfer learning models.
  • It has separate modules that can be used in Google Colab ( pip install -U monk-colab ), Kaggle notebooks ( pip install -U monk-kaggle )and other data science competition platforms.
  • Allows project multiple project management and prototyping.
  • Quick data loading, training, validation and deployment.

CPU (Non GPU) : pip install -U monk-cpu

All backend:              pip install -U monk-cpu

Gluon backend: pip install -U monk-gluon-cpu

Pytorch backend: pip install -U monk-pytorch-cpu

Keras backend: pip install -U monk-keras-cpu

For a complete list visit: repository


Clone the repository: git clone


Create an image classifier

 #Create an experiment
 ptf.Prototype("project-1", "experiment-1")
 #Load Data
 # Train


predictions = ptf.Infer(img_name="image.png", return_raw=True);

Compare Experiments

 #Create comparison project
 #Add all your experiments
 ctf.Add_Experiment("project-1", "experiment-1");
 ctf.Add_Experiment("project-2", "experiment-2");
 # Generate statistics

Weather classification Demo:

Explore the different experiments from here.

Monk Object Detection library contains object detection using transfer learning, image segmentation and localization, activity recognition, pose estimation, OCR and some other use cases.


  • Wide range of SOTA algorithms
  • Easy installation and usage
  • Build object detection pipelines
  • Support for custom annotation formats – COCO, YOLO, PASCAL VOC, etc
  • Easy deployment

Pothole detection: Source Code

To use the pretrained model use this 10 lines of code:

import os
import sys
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/") sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")

from infer_base import *
class_file = set_class_list("pothole_trained/classes.txt"); 

set_model_params(model_name="vgg16", model_path="pothole_trained/model_vgg16-0050.params"); 

set_hyper_params(gpus="0", batch_size=1); 
set_img_preproc_params(img_short_side=600, img_long_side=1000,                         mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 


sym = set_network(); mod = load_model(sym); 
set_output_params(vis_thresh=0.9, vis=True) output = Infer("pothole_trained/test/img1.jpg", mod); 

pothole  0.9950724244117737 [109.65555071009518, 303.28242910581656, 441.5280866987249, 492.36453810607287] pothole  0.9883765578269958 [503.1345823081632, 246.1549497164476, 658.0930820964118, 353.84971546768463] 

To train custom detector, dataset and annotation files need to be downloaded (Provided in the repository). 

Training using mxrcnn:

 import os
 import sys
 from train_base import * 


 root_dir = "./";
 coco_dir = "potholes";
 img_dir = "images";
 set_dataset_params(root_dir=root_dir, coco_dir=coco_dir, imageset=img_dir); 

#model type



set_hyper_params(gpus="0", lr=0.001, lr_decay_epoch="1", epochs=2, batch_size=1);
 set_output_params(log_interval=100, save_prefix="model_vgg16"); 


 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                      mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 

#initializing parameters


# Invoke Dataloader

roidb = set_dataset();

 INFO:root:computing cache ./cache/coco_images_roidb.pkl
 INFO:root:saving cache ./cache/coco_images_roidb.pkl
 INFO:root:coco_images num_images 665
 INFO:root:filter roidb: 665 -> 665
 INFO:root:coco_images append flipped images to roidb
 loading annotations into memory...
 Done (t=0.01s)
 creating index...
 index created! 


sym = set_network();

# Train

train(sym, roidb);

 INFO:root:max input shape
 {'bbox_target': (1, 36, 62, 62),
  'bbox_weight': (1, 36, 62, 62),
  'data': (1, 3, 1000, 1000),
  'gt_boxes': (1, 100, 5),
  'im_info': (1, 3),
  'label': (1, 1, 558, 62)}
 INFO:root:max output shape
 {'bbox_loss_reshape_output': (1, 128, 8),
  'blockgrad0_output': (1, 128),
  'cls_prob_reshape_output': (1, 128, 2),
  'rpn_bbox_loss_output': (1, 36, 62, 62),
  'rpn_cls_prob_output': (1, 2, 558, 62)}
 INFO:root:locking params
 ['conv1_1_weight',  'conv1_1_bias',  'conv1_2_weight',  'conv1_2_bias',  'conv2_1_weight',  'conv2_1_bias',  'conv2_2_weight',  'conv2_2_bias',  'conv3_1_weight',  'conv3_1_bias',  'conv3_2_weight',  'conv3_2_bias',  'conv3_3_weight',  'conv3_3_bias',  'conv4_1_weight',
  'conv4_1_bias',  'conv4_2_weight',  'conv4_2_bias',  'conv4_3_weight',  'conv4_3_bias']
 INFO:root:lr 0.001000 lr_epoch_diff [1] lr_iters [1330]
 INFO:root:Epoch[0] Batch [0-100] Speed: 4.71 samples/sec RPNAcc=0.926091 RPNLogLoss=0.251855 RPNL1Loss=0.887560 RCNNAcc=0.874691 RCNNLogLoss=0.319362 RCNNL1Loss=2.404060
 INFO:root:Epoch[0] Batch [0-200] Speed: 4.23 samples/sec RPNAcc=0.934255 RPNLogLoss=0.218708 RPNL1Loss=0.808457 RCNNAcc=0.869170 RCNNLogLoss=0.320706 RCNNL1Loss=2.343518
 INFO:root:Epoch[0] Batch [0-300] Speed: 4.43 samples/sec RPNAcc=0.936786 RPNLogLoss=0.207275 RPNL1Loss=0.819963 RCNNAcc=0.876739 RCNNLogLoss=0.296186 RCNNL1Loss=2.326637
 INFO:root:Epoch[0] Batch [0-400] Speed: 4.51 samples/sec RPNAcc=0.938338 RPNLogLoss=0.195882 RPNL1Loss=0.786549 RCNNAcc=0.879988 RCNNLogLoss=0.286368 RCNNL1Loss=2.306008
 INFO:root:Epoch[0] Batch [0-500] Speed: 4.41 samples/sec RPNAcc=0.940432 RPNLogLoss=0.184636 RPNL1Loss=0.763770 RCNNAcc=0.882797 RCNNLogLoss=0.277150 RCNNL1Loss=2.277596 


 from infer_base import *
 class_file = set_class_list("./potholes/annotations/classes.txt"); 

#Model – Select the model as per number of iterations it has been trained for

set_model_params(model_name="vgg16", model_path="trained_model/model_vgg16-0002.params");

#Hyper Params

set_hyper_params(gpus="0", batch_size=1);

# Preprocessing

 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                        mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 




 sym = set_network();
 mod = load_model(sym); 

#Load Image and infer

 set_output_params(vis_thresh=0.9, vis=True)
 output = Infer("potholes/test/img1.jpg", mod); 
pothole  0.9788532853126526 [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625] 
pothole 0.8982967138290405 [532.2888671875, 268.3476318359375, 610.94091796875, 316.1823974609375] 
pothole  0.8982408046722412 [166.78828125, 163.675048828125, 226.873828125, 200.3756591796875] pothole  0.8538920283317566 [451.009326171875, 240.5260986328125, 521.778076171875, 298.5822998046875] 
['pothole\n', 0.9788532853126526, [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625]] 

Check out the other implemented applications: link

Monk GUI is a no-code GUI platform built on top of Monk and Monk object detection libraries to provide an interface for computer vision problems. It is built using the PyQt5 library.

Weed Classification Demo:

End Notes

Monk is an amazing library to build computer vision solutions. With just a few lines, one can develop models. The repository provided contains datasets and annotations(for object detection) and pre-trained models to get our work done quickly. In the upcoming releases, we can expect to get custom training interfaces and integration with more frameworks and models. 

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter