Now Reading
Inside HugsVision, An Open-Source Hugging Face Wrapper For Computer Vision

Inside HugsVision, An Open-Source Hugging Face Wrapper For Computer Vision

  • HugsVision supports both CPU and GPU computations.
Inside HugsVision, An Open-Source Hugging Face Wrapper For Computer Vision

A researcher from Avignon University recently released an open-source, easy-to-use wrapper to Hugging Face for Healthcare Computer Vision, called HugsVision. This new toolkit is used to develop state-of-the-art computer vision technologies, including systems for image classification, semantic segmentation, object detection, image generation, denoising, etc. 

The source code for HugsVision is available on GitHub

Register for FREE Workshop on Data Engineering>>

HugsVision can be set up via PyPI to use the standard library. It supports both CPU and GPU computations. However, for most recipes, a GPU is necessary during training. Also, CUDA needs to be installed to use GPUs. 

All the model checkpoints provided by Hugging Face Transformers and compatible with tasks can be seamlessly integrated from its Hugging Face model hub, where they are uploaded directly by users and organisations. 

Currently, Hugging Face Transformers provides the following architectures for computer vision

Example of HugsVision 

In the article ‘How to train a custom vision transformer (ViT) image classifier to help endoscopists in less than five minutes,’ the creator of HugsVision, Yanis Labrak, showed how to train an image classifier model based on transformer architecture to help endoscopists automate the detection of various anatomical landmarks, pathological findings, or endoscopic procedures in the gastrointestinal tract.

Here are the steps to follow when building an image-classification model: 

Install HugsVision 

To begin with, set up the Anaconda environment. The author said Anaconda is a good way to reduce compatibility issues between package versions for all your projects by providing you with an isolated Python environment. 

After this, install HugsVision from PyPI. Doing this will provide you with a fast way to install a toolkit without worrying about dependencies conflicts, said Labrak.

Download Kvasir V2 dataset & load it 

For this study, the researcher has used Kvasir Dataset v2, which weighs ~2.3 GB. The dataset comprises eight classes, consisting of 1,000 images each, for a total of 8000 images. The ‘jpeg’ images are stored in a separate folder according to the class they belong to. Each class shows anatomical landmarks, pathological findings, or endoscopic procedures in the gastrointestinal tract. 

Once the dataset has converted, the next step is to load the data. Here, the first parameter is the path to the dataset folder, followed by the size in percentage of the test dataset; allow to balance the number of documents in each class for the training dataset; and enable the data augmentation, which randomly changes the contrast of the images. 

Kvasir V2 dataset data sample for each class (Source: yanis labrak)

Choose an image classification model 

The researcher has selected the Hugging Face transformers package, which provides access to the Hugging Face Hub, which includes pretrained models and pipelines for various tasks in domains such as NLP, computer vision (CV), or automatic speech recognition (ASR). 

Once the base model is selected, you can perform a fine-tuning to make it fit the needs. Fine-tuning is an essential step of pursuing the training phase of generic models, pre-trained on a close (image classification) but on a larger amount of data. This approach has shown better results/outcomes than training a model from scratch using the targeted data in many tasks.

Advantages of using a pre-trained model: 

See Also
DeepMind Wants To Change How Reinforcement Learning ‘Collect & Infer’

  • Since they are training only the classification layer and freezing the other ones, the training process becomes faster 
  • Due to already trained embeddings, the model becomes more effective 

To ensure that the model is compatible with HugsVision, you need to have a model reported in PyTorch and compatible with the image classification task. Check out the models available with these criteria here

Set up the Trainer and start the fine-tuning 

Once the model is selected, you can start building the Trainer and start the fine-tuning. Here are the outputs

Evaluate the performance of the model 

The researcher has used F1-Score metrics to represent predictions for all the labels better and find any anomalies with a specific label. Here is how F1-Score is calculated: 

(Source: yanis labrak)

When drawing the confusion matrix, the author believes that the F1-Score is a nice way to get an overview of the results, but is not enough to understand the reason for these errors deeply, as errors can be caused by an imbalanced dataset, a lack of data, or even high proximity between classes. 

So, to understand the decision or fix the model, knowing classification confusion between classes may help. 

(Source: yanis labrak)

Use Hugging Face to run inference on images 

Here, you will have to rename the ‘./out/MODEL_PATH/config.json file present in the model output to ‘./out/MODEL_PATH?preprocessor_config.json‘.

Wrapping up 

HugsVision is still in the early stages of development and evolving. However, the new features, tutorials and documentation are expected to be released soon. Check out the complete code for training your custom vision transformer (ViT) image classifier here. Also, find more tutorials about using HugsVision on GitHub

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top