Last updated February 18, 2021
In AI Mysteries

Build Computer Vision Applications with Few Lines of Code using MONK AI

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification.

Share

Published on December 21, 2020

by Jayita Bhattacharyya

Tessellate Imaging is an Indian based AI startup company helping businesses grow scalably with the power of Machine Learning, Computer Vision, Image Processing and Analysis, Deep Learning, and ML/DL DevOps. Key members – Adhesh Shrivastava(CEO), Akash Deep Singh(COO), Abhishek Kumar Annamraju(CTO). They have worked with all kinds of industries, from healthcare to retail companies such as AlemHealth, NetLink, GEO Graph, blooskai, Packt and many others. Apart from this, they have released some amazing open-source libraries on GitHub. Let’s have a look at these.

Monk is a low-code deep learning toolkit to leverage computer vision resources. It has plenty of use cases in computer vision-based problems such as image processing, image classification.

GitHub Repository – https://github.com/Tessellate-Imaging/monk_v1

Documentation – https://li8bot.github.io/monkai/#/home

Demonstrations – https://github.com/abhi-kumar/monk_cls_demos

Features:

It is best suited for beginners (with a complete roadmap). Even developers and researchers can use it to build an end-to-end model for a specified use case quickly.
Auto hyperparameter tuning with its hyperparameter analyzer.
It can be integrated with any deep learning framework such as backend – Tensorflow, PyTorch, MXNet, etc. with a wide range of transfer learning models.
It has separate modules that can be used in Google Colab ( pip install -U monk-colab ), Kaggle notebooks ( pip install -U monk-kaggle )and other data science competition platforms.
Allows project multiple project management and prototyping.
Quick data loading, training, validation and deployment.

Installation
CPU (Non GPU) : pip install -U monk-cpu

All backend: pip install -U monk-cpu

Gluon backend: pip install -U monk-gluon-cpu

Pytorch backend: pip install -U monk-pytorch-cpu

Keras backend: pip install -U monk-keras-cpu

For a complete list visit: repository

Clone the repository: git clone https://github.com/Tessellate-Imaging/monk_v1.git

Usage:

Create an image classifier

 #Create an experiment
 ptf.Prototype("project-1", "experiment-1")
 #Load Data
 ptf.Default(dataset_path="dataset/", 
              model_name="resnet50", 
              num_epochs=10)
 # Train
 ptf.Train()

Inference

predictions = ptf.Infer(img_name="image.png", return_raw=True);

Compare Experiments

 #Create comparison project
 ctf.Comparison("Comparison-1");
 #Add all your experiments
 ctf.Add_Experiment("project-1", "experiment-1");
 ctf.Add_Experiment("project-2", "experiment-2");
 # Generate statistics
 ctf.Generate_Statistics();

Weather classification Demo:

Explore the different experiments from here.

Monk Object Detection library contains object detection using transfer learning, image segmentation and localization, activity recognition, pose estimation, OCR and some other use cases.

Features:

Wide range of SOTA algorithms
Easy installation and usage
Build object detection pipelines
Support for custom annotation formats – COCO, YOLO, PASCAL VOC, etc
Easy deployment

Pothole detection: Source Code

To use the pretrained model use this 10 lines of code:

import os
import sys
sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/") sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")

from infer_base import *
class_file = set_class_list("pothole_trained/classes.txt"); 

set_model_params(model_name="vgg16", model_path="pothole_trained/model_vgg16-0050.params"); 

set_hyper_params(gpus="0", batch_size=1); 
set_img_preproc_params(img_short_side=600, img_long_side=1000,                         mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0)); 

initialize_rpn_params(); 
initialize_rcnn_params(); 

sym = set_network(); mod = load_model(sym); 
set_output_params(vis_thresh=0.9, vis=True) output = Infer("pothole_trained/test/img1.jpg", mod); 

pothole  0.9950724244117737 [109.65555071009518, 303.28242910581656, 441.5280866987249, 492.36453810607287] pothole  0.9883765578269958 [503.1345823081632, 246.1549497164476, 658.0930820964118, 353.84971546768463]

To train custom detector, dataset and annotation files need to be downloaded (Provided in the repository).

Training using mxrcnn:

 import os
 import sys
 sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/")
 sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")
 from train_base import *

#dataset

 root_dir = "./";
 coco_dir = "potholes";
 img_dir = "images";
 set_dataset_params(root_dir=root_dir, coco_dir=coco_dir, imageset=img_dir);

#model type

set_model_params(model_name="vgg16");

#hyperparameter

set_hyper_params(gpus="0", lr=0.001, lr_decay_epoch="1", epochs=2, batch_size=1);
 set_output_params(log_interval=100, save_prefix="model_vgg16");

#preprocessing

 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                      mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0));

#initializing parameters

 initialize_rpn_params();
 initialize_rcnn_params();

# Invoke Dataloader

roidb = set_dataset();

 INFO:root:computing cache ./cache/coco_images_roidb.pkl
 INFO:root:saving cache ./cache/coco_images_roidb.pkl
 INFO:root:coco_images num_images 665
 INFO:root:filter roidb: 665 -> 665
 INFO:root:coco_images append flipped images to roidb
 loading annotations into memory...
 Done (t=0.01s)
 creating index...
 index created!

#Network

sym = set_network();

# Train

train(sym, roidb);

 INFO:root:max input shape
 {'bbox_target': (1, 36, 62, 62),
  'bbox_weight': (1, 36, 62, 62),
  'data': (1, 3, 1000, 1000),
  'gt_boxes': (1, 100, 5),
  'im_info': (1, 3),
  'label': (1, 1, 558, 62)}
 INFO:root:max output shape
 {'bbox_loss_reshape_output': (1, 128, 8),
  'blockgrad0_output': (1, 128),
  'cls_prob_reshape_output': (1, 128, 2),
  'rpn_bbox_loss_output': (1, 36, 62, 62),
  'rpn_cls_prob_output': (1, 2, 558, 62)}
 INFO:root:locking params
 ['conv1_1_weight',  'conv1_1_bias',  'conv1_2_weight',  'conv1_2_bias',  'conv2_1_weight',  'conv2_1_bias',  'conv2_2_weight',  'conv2_2_bias',  'conv3_1_weight',  'conv3_1_bias',  'conv3_2_weight',  'conv3_2_bias',  'conv3_3_weight',  'conv3_3_bias',  'conv4_1_weight',
  'conv4_1_bias',  'conv4_2_weight',  'conv4_2_bias',  'conv4_3_weight',  'conv4_3_bias']
 INFO:root:lr 0.001000 lr_epoch_diff [1] lr_iters [1330]
 INFO:root:Epoch[0] Batch [0-100] Speed: 4.71 samples/sec RPNAcc=0.926091 RPNLogLoss=0.251855 RPNL1Loss=0.887560 RCNNAcc=0.874691 RCNNLogLoss=0.319362 RCNNL1Loss=2.404060
 INFO:root:Epoch[0] Batch [0-200] Speed: 4.23 samples/sec RPNAcc=0.934255 RPNLogLoss=0.218708 RPNL1Loss=0.808457 RCNNAcc=0.869170 RCNNLogLoss=0.320706 RCNNL1Loss=2.343518
 INFO:root:Epoch[0] Batch [0-300] Speed: 4.43 samples/sec RPNAcc=0.936786 RPNLogLoss=0.207275 RPNL1Loss=0.819963 RCNNAcc=0.876739 RCNNLogLoss=0.296186 RCNNL1Loss=2.326637
 INFO:root:Epoch[0] Batch [0-400] Speed: 4.51 samples/sec RPNAcc=0.938338 RPNLogLoss=0.195882 RPNL1Loss=0.786549 RCNNAcc=0.879988 RCNNLogLoss=0.286368 RCNNL1Loss=2.306008
 INFO:root:Epoch[0] Batch [0-500] Speed: 4.41 samples/sec RPNAcc=0.940432 RPNLogLoss=0.184636 RPNL1Loss=0.763770 RCNNAcc=0.882797 RCNNLogLoss=0.277150 RCNNL1Loss=2.277596

#inference

 from infer_base import *
 class_file = set_class_list("./potholes/annotations/classes.txt");

#Model – Select the model as per number of iterations it has been trained for

set_model_params(model_name="vgg16", model_path="trained_model/model_vgg16-0002.params");

#Hyper Params

set_hyper_params(gpus="0", batch_size=1);

# Preprocessing

 set_img_preproc_params(img_short_side=600, img_long_side=1000, 
                        mean=(123.68, 116.779, 103.939), std=(1.0, 1.0, 1.0));

#Initialization

 initialize_rpn_params();
 initialize_rcnn_params();

#Network

 sym = set_network();
 mod = load_model(sym);

#Load Image and infer

 set_output_params(vis_thresh=0.9, vis=True)
 output = Infer("potholes/test/img1.jpg", mod);

pothole  0.9788532853126526 [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625] 
pothole 0.8982967138290405 [532.2888671875, 268.3476318359375, 610.94091796875, 316.1823974609375] 
pothole  0.8982408046722412 [166.78828125, 163.675048828125, 226.873828125, 200.3756591796875] pothole  0.8538920283317566 [451.009326171875, 240.5260986328125, 521.778076171875, 298.5822998046875] 
['pothole\n', 0.9788532853126526, [245.0558349609375, 219.602734375, 437.7943359375, 312.4978515625]]

Check out the other implemented applications: link

Monk GUI is a no-code GUI platform built on top of Monk and Monk object detection libraries to provide an interface for computer vision problems. It is built using the PyQt5 library.

Weed Classification Demo:

End Notes

Monk is an amazing library to build computer vision solutions. With just a few lines, one can develop models. The repository provided contains datasets and annotations(for object detection) and pre-trained models to get our work done quickly. In the upcoming releases, we can expect to get custom training interfaces and integration with more frameworks and models.

Access all our open Survey & Awards Nomination forms in one place