Inside HugsVision, An Open-Source Hugging Face Wrapper For Computer Vision

HugsVision supports both CPU and GPU computations.
Inside HugsVision, An Open-Source Hugging Face Wrapper For Computer Vision

A researcher from Avignon University recently released an open-source, easy-to-use wrapper to Hugging Face for Healthcare Computer Vision, called HugsVision. This new toolkit is used to develop state-of-the-art computer vision technologies, including systems for image classification, semantic segmentation, object detection, image generation, denoising, etc. 

The source code for HugsVision is available on GitHub


Sign up for your weekly dose of what's up in emerging technology.

HugsVision can be set up via PyPI to use the standard library. It supports both CPU and GPU computations. However, for most recipes, a GPU is necessary during training. Also, CUDA needs to be installed to use GPUs. 

All the model checkpoints provided by Hugging Face Transformers and compatible with tasks can be seamlessly integrated from its Hugging Face model hub, where they are uploaded directly by users and organisations. 

Currently, Hugging Face Transformers provides the following architectures for computer vision

Example of HugsVision 

In the article ‘How to train a custom vision transformer (ViT) image classifier to help endoscopists in less than five minutes,’ the creator of HugsVision, Yanis Labrak, showed how to train an image classifier model based on transformer architecture to help endoscopists automate the detection of various anatomical landmarks, pathological findings, or endoscopic procedures in the gastrointestinal tract.

Here are the steps to follow when building an image-classification model: 

Install HugsVision 

To begin with, set up the Anaconda environment. The author said Anaconda is a good way to reduce compatibility issues between package versions for all your projects by providing you with an isolated Python environment. 

After this, install HugsVision from PyPI. Doing this will provide you with a fast way to install a toolkit without worrying about dependencies conflicts, said Labrak.

Download Kvasir V2 dataset & load it 

For this study, the researcher has used Kvasir Dataset v2, which weighs ~2.3 GB. The dataset comprises eight classes, consisting of 1,000 images each, for a total of 8000 images. The ‘jpeg’ images are stored in a separate folder according to the class they belong to. Each class shows anatomical landmarks, pathological findings, or endoscopic procedures in the gastrointestinal tract. 

Once the dataset has converted, the next step is to load the data. Here, the first parameter is the path to the dataset folder, followed by the size in percentage of the test dataset; allow to balance the number of documents in each class for the training dataset; and enable the data augmentation, which randomly changes the contrast of the images. 

Kvasir V2 dataset data sample for each class (Source: yanis labrak)

Choose an image classification model 

The researcher has selected the Hugging Face transformers package, which provides access to the Hugging Face Hub, which includes pretrained models and pipelines for various tasks in domains such as NLP, computer vision (CV), or automatic speech recognition (ASR). 

Once the base model is selected, you can perform a fine-tuning to make it fit the needs. Fine-tuning is an essential step of pursuing the training phase of generic models, pre-trained on a close (image classification) but on a larger amount of data. This approach has shown better results/outcomes than training a model from scratch using the targeted data in many tasks.

Advantages of using a pre-trained model: 

  • Since they are training only the classification layer and freezing the other ones, the training process becomes faster 
  • Due to already trained embeddings, the model becomes more effective 

To ensure that the model is compatible with HugsVision, you need to have a model reported in PyTorch and compatible with the image classification task. Check out the models available with these criteria here

Set up the Trainer and start the fine-tuning 

Once the model is selected, you can start building the Trainer and start the fine-tuning. Here are the outputs

Evaluate the performance of the model 

The researcher has used F1-Score metrics to represent predictions for all the labels better and find any anomalies with a specific label. Here is how F1-Score is calculated: 

(Source: yanis labrak)

When drawing the confusion matrix, the author believes that the F1-Score is a nice way to get an overview of the results, but is not enough to understand the reason for these errors deeply, as errors can be caused by an imbalanced dataset, a lack of data, or even high proximity between classes. 

So, to understand the decision or fix the model, knowing classification confusion between classes may help. 

(Source: yanis labrak)

Use Hugging Face to run inference on images 

Here, you will have to rename the ‘./out/MODEL_PATH/config.json file present in the model output to ‘./out/MODEL_PATH?preprocessor_config.json‘.

Wrapping up 

HugsVision is still in the early stages of development and evolving. However, the new features, tutorials and documentation are expected to be released soon. Check out the complete code for training your custom vision transformer (ViT) image classifier here. Also, find more tutorials about using HugsVision on GitHub

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM