Now Reading
Guide to VISSL: Vision Library for Self-Supervised Learning

Guide to VISSL: Vision Library for Self-Supervised Learning

Aishwarya Verma

VISSL  is a computer VIsion library for state-of-the-art Self-Supervised Learning research. This framework is based on PyTorch. The key idea of this library is to speed up the self-supervised learning process from handling a new design to the evaluation part, VISSL does it all. Following are the characteristic of VISSL framework:



For google colab notebook, following are the instructions to install VISSL.

  1. Install the dependencies.
 !pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f
 # install opencv
 !pip install opencv-python
 !pip install apex -f{version_str}/download.html 
  1. Then install VISSL via pip.
 !pip install vissl
 # verify installation
 !python -c 'import vissl' 

Check this link for other methods of installation.

Quick Start with VISSL

This quick-start demo will show the training with VISSL framework and YAML configuration. 

  1. Before getting started with the training part, let us discuss YAML config files provided by VISSL. VISSL uses Hydra for configuration management. All the YAML files provided by it are available here. For this demo, we are going to use YAML config file for training ResNet-50 supervised model on 1-GPU, which can be downloaded from here or,
 !mkdir -p configs/config/
 !wget -O configs/ 
 !wget -O configs/config/supervised_1gpu_resnet_example.yaml 

    To understand the YAML file in more detail, check this link.

  1. For training purposes, VISSL provides a helper tool which can do the feature extraction and training on VISSL. This helper tool is made in such a way that it can do training on 1-GPU or multi-GPU and even provide a distributed environment for training. The file can be downloaded as:


  1. Create a custom dataset for training ResNet-50, you can take an ImageNet dataset also. The code for it is available here. For using custom data with VISSL, we have to register it in VISSL(providing metadata and path to the dataset). For this, we create a simple JSON file with the metadata and save it to `configs/config/` file.
     json_data = {
     "dummy_data_folder": {
       "train": [
         "/content/dummy_data/train", "/content/dummy_data/train"
       "val": [
         "/content/dummy_data/val", "/content/dummy_data/val"
 # use VISSL's api to save or you can use your custom code.
 from import save_file
 save_file(json_data, "/content/configs/config/dataset_catalog.json") 

    You can verify whether the dataset is registered with VISSL by following commands:

 from import VisslDatasetCatalog
 # list all the datasets that exist in catalog
 # get the metadata of dummy_data_folder dataset
  1. Train the model by the following command.
 !python3 \
     hydra.verbose=true \
     config=supervised_1gpu_resnet_example \
     config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
     config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] \
     config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] \
     config.DATA.TEST.DATA_SOURCES=[disk_folder] \
     config.DATA.TEST.LABEL_SOURCES=[disk_folder] \
     config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] \
     config.OPTIMIZER.num_epochs=2 \[0.01,0.001] \[1] \

The trained model is available at checkpoints/model_final_checkpoint_phase2.torch. This command will dump all the training logs, checkpoints and metrics in ./checkpoints directory.

See Also
Decoding Most Used, Confused & Abused Jargons In Machine Learning

    In the above command,

  • config=supervised_1gpu_resnet_example :  defines the config file for supervised training.
  • config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] : define the data source for train. In this case, it is disk_folder.
  • config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] : define the dataset name i.e. dummy_data_folder. We registered this dataset above.
  • config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] : another way of specifying where the data is on the disk. If you are using ImageNet dataset, specify the path as /path/to/my/imagenet/folder/train.
  • config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] : specify the paths for Test dataset. Similar to train dataset.
  • config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] : another way of specifying where the data is on the disk. 
  • config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 : specify the usage of resources i.e., 2 img per gpu to use for both TRAIN and TEST.
  • config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 : setting for distributed training. In this example we have stated gpu as 1 and machine as 1.
  • config.OPTIMIZER.num_epochs=2[0.01,0.001][1] : specify epochs=2 and drop learning rate after 1 epoch.

You can check the full demo here.


In this article, we have discussed VISSL framework and its basics. All the advanced tutorials are available at this link.

Official codes, docs & Tutorials are available at:

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join Our Telegram Group. Be part of an engaging online community. Join Here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top