Guide to VISSL: Vision Library for Self-Supervised Learning


VISSL  is a computer VIsion library for state-of-the-art Self-Supervised Learning research. This framework is based on PyTorch. The key idea of this library is to speed up the self-supervised learning process from handling a new design to the evaluation part, VISSL does it all. Following are the characteristic of VISSL framework:




Sign up for your weekly dose of what's up in emerging technology.

For google colab notebook, following are the instructions to install VISSL.

  1. Install the dependencies.
 !pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f
 # install opencv
 !pip install opencv-python
 !pip install apex -f{version_str}/download.html 
  1. Then install VISSL via pip.
 !pip install vissl
 # verify installation
 !python -c 'import vissl' 

Check this link for other methods of installation.

Quick Start with VISSL

This quick-start demo will show the training with VISSL framework and YAML configuration. 

  1. Before getting started with the training part, let us discuss YAML config files provided by VISSL. VISSL uses Hydra for configuration management. All the YAML files provided by it are available here. For this demo, we are going to use YAML config file for training ResNet-50 supervised model on 1-GPU, which can be downloaded from here or,
 !mkdir -p configs/config/
 !wget -O configs/ 
 !wget -O configs/config/supervised_1gpu_resnet_example.yaml 

    To understand the YAML file in more detail, check this link.

  1. For training purposes, VISSL provides a helper tool which can do the feature extraction and training on VISSL. This helper tool is made in such a way that it can do training on 1-GPU or multi-GPU and even provide a distributed environment for training. The file can be downloaded as:


  1. Create a custom dataset for training ResNet-50, you can take an ImageNet dataset also. The code for it is available here. For using custom data with VISSL, we have to register it in VISSL(providing metadata and path to the dataset). For this, we create a simple JSON file with the metadata and save it to `configs/config/` file.
     json_data = {
     "dummy_data_folder": {
       "train": [
         "/content/dummy_data/train", "/content/dummy_data/train"
       "val": [
         "/content/dummy_data/val", "/content/dummy_data/val"
 # use VISSL's api to save or you can use your custom code.
 from import save_file
 save_file(json_data, "/content/configs/config/dataset_catalog.json") 

    You can verify whether the dataset is registered with VISSL by following commands:

 from import VisslDatasetCatalog
 # list all the datasets that exist in catalog
 # get the metadata of dummy_data_folder dataset
  1. Train the model by the following command.
 !python3 \
     hydra.verbose=true \
     config=supervised_1gpu_resnet_example \
     config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
     config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] \
     config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] \
     config.DATA.TEST.DATA_SOURCES=[disk_folder] \
     config.DATA.TEST.LABEL_SOURCES=[disk_folder] \
     config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] \
     config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] \
     config.OPTIMIZER.num_epochs=2 \[0.01,0.001] \[1] \

The trained model is available at checkpoints/model_final_checkpoint_phase2.torch. This command will dump all the training logs, checkpoints and metrics in ./checkpoints directory.

    In the above command,

  • config=supervised_1gpu_resnet_example :  defines the config file for supervised training.
  • config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] : define the data source for train. In this case, it is disk_folder.
  • config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] : define the dataset name i.e. dummy_data_folder. We registered this dataset above.
  • config.DATA.TRAIN.DATA_PATHS=[/content/dummy_data/train] : another way of specifying where the data is on the disk. If you are using ImageNet dataset, specify the path as /path/to/my/imagenet/folder/train.
  • config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] : specify the paths for Test dataset. Similar to train dataset.
  • config.DATA.TEST.DATA_PATHS=[/content/dummy_data/val] : another way of specifying where the data is on the disk. 
  • config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 : specify the usage of resources i.e., 2 img per gpu to use for both TRAIN and TEST.
  • config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 : setting for distributed training. In this example we have stated gpu as 1 and machine as 1.
  • config.OPTIMIZER.num_epochs=2[0.01,0.001][1] : specify epochs=2 and drop learning rate after 1 epoch.

You can check the full demo here.


In this article, we have discussed VISSL framework and its basics. All the advanced tutorials are available at this link.

Official codes, docs & Tutorials are available at:

More Great AIM Stories

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM