Active Hackathon

Guide to VISSL: Vision Library for Self-Supervised Learning

VISSL

VISSL  is a computer VIsion library for state-of-the-art Self-Supervised Learning research. This framework is based on PyTorch. The key idea of this library is to speed up the self-supervised learning process from handling a new design to the evaluation part, VISSL does it all. Following are the characteristic of VISSL framework:

Requirements 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Installation

For google colab notebook, following are the instructions to install VISSL.

  1. Install the dependencies.
 !pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
 # install opencv
 !pip install opencv-python
 !pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/{version_str}/download.html 
  1. Then install VISSL via pip.
 !pip install vissl
 # verify installation
 !python -c 'import vissl' 

Check this link for other methods of installation.

Quick Start with VISSL

This quick-start demo will show the training with VISSL framework and YAML configuration. 

  1. Before getting started with the training part, let us discuss YAML config files provided by VISSL. VISSL uses Hydra for configuration management. All the YAML files provided by it are available here. For this demo, we are going to use YAML config file for training ResNet-50 supervised model on 1-GPU, which can be downloaded from here or,
 !mkdir -p configs/config/
 !wget -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py 
 !wget -O configs/config/supervised_1gpu_resnet_example.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/supervised_1gpu_resnet_example.yaml 

    To understand the YAML file in more detail, check this link.

  1. For training purposes, VISSL provides a helper tool which can do the feature extraction and training on VISSL. This helper tool is made in such a way that it can do training on 1-GPU or multi-GPU and even provide a distributed environment for training. The file can be downloaded as:

!wget https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py

  1. Create a custom dataset for training ResNet-50, you can take an ImageNet dataset also. The code for it is available here. For using custom data with VISSL, we have to register it in VISSL(providing metadata and path to the dataset). For this, we create a simple JSON file with the metadata and save it to `configs/config/dataset_catalog.py` file.
     json_data = {
     "dummy_data_folder": {
       "train": [
         "/content/dummy_data/train", "/content/dummy_data/train"
       ],
       "val": [
         "/content/dummy_data/val", "/content/dummy_data/val"
       ]
     }
 }
 # use VISSL's api to save or you can use your custom code.
 from vissl.utils.io import save_file
 save_file(json_data, "/content/configs/config/dataset_catalog.json") 

    You can verify whether the dataset is registered with VISSL by following commands:

 from vissl.data.dataset_catalog import VisslDatasetCatalog
 # list all the datasets that exist in catalog
 print(VisslDatasetCatalog.list())
 # get the metadata of dummy_data_folder dataset
 print(VisslDatasetCatalog.get("dummy_data_folder")) 
  1. Train the model by the following command.
 !python3 run_distributed_engines.py \
     hydra.verbose=true \
     config=supervised_1gpu_resnet_example \
     config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
     config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] \
     config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] \
     config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 \
     config.DATA.TEST.DATA_SOURCES=[disk_folder] \
     config.DATA.TEST.LABEL_SOURCES=[disk_folder] \
     config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] \
     config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 \
     config.DISTRIBUTED.NUM_NODES=1 \
     config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
     config.OPTIMIZER.num_epochs=2 \
     config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001] \
     config.OPTIMIZER.param_schedulers.lr.milestones=[1] \
     config.TENSORBOARD_SETUP.USE_TENSORBOARD=true \
     config.CHECKPOINT.DIR="./checkpoints" 

The trained model is available at checkpoints/model_final_checkpoint_phase2.torch. This command will dump all the training logs, checkpoints and metrics in ./checkpoints directory.

    In the above command,

  • config=supervised_1gpu_resnet_example :  defines the config file for supervised training.
  • config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] : define the data source for train. In this case, it is disk_folder.
  • config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] : define the dataset name i.e. dummy_data_folder. We registered this dataset above.
  • config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] : another way of specifying where the data is on the disk. If you are using ImageNet dataset, specify the path as /path/to/my/imagenet/folder/train.
  • config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] : specify the paths for Test dataset. Similar to train dataset.
  • config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] : another way of specifying where the data is on the disk. 
  • config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 : specify the usage of resources i.e., 2 img per gpu to use for both TRAIN and TEST.
  • config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 : setting for distributed training. In this example we have stated gpu as 1 and machine as 1.
  • config.OPTIMIZER.num_epochs=2 config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001] config.OPTIMIZER.param_schedulers.lr.milestones=[1] : specify epochs=2 and drop learning rate after 1 epoch.

You can check the full demo here.

Conclusion

In this article, we have discussed VISSL framework and its basics. All the advanced tutorials are available at this link.

Official codes, docs & Tutorials are available at:

More Great AIM Stories

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]