UCF-101 dataset has 101 actions and 13320 clips of human actions, collected from youtube were first introduced in 2012 by researchers: Khurram Soomro, Amir Roshan Zamir and Mubarak Shah of Center for Research in Computer Vision, Orlando, FL 32816, USA. The clips in the action class are divided into 25 groups. Each group contains 4-7 clips. Clips in each group share some common features like background or actor.
UCF Sports, UCF11, UCF50 and UCF101 are the datasets arranged by UCF in sequential order, each one incorporates its forerunner. UCF-101 is the largest among them with 101 classes. This dataset gives the biggest variety as far as activities and with the presence of huge varieties in camera movement, object appearance and posture, object scale, perspective, jumbled foundation, light conditions.
Here, we will discuss the dataset and see how to load the dataset using TensorFlow and PyTorch. Further, we will implement the UCF-101 dataset in TensorFlow.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
About the dataset
The dataset can be downloaded from the following link. It includes web videos which are recorded in various lighting conditions and low-quality frames. 101 human actions classes are divided into 5 types: Human-Object Interaction, HumanHuman Interaction, Playing Musical, Body-Motion Only, Instruments and Sports.

Load the dataset using different deep learning frameworks.
Tensorflow
import tensorflow as tf import tensorflow_datasets as tfds import math x_train = tfds.load('ucf101', split='train', shuffle_files=True, batch_size = 64)
Pytorch
import torch import torchvision ucf_data = torchvision.datasets.UCF101(root,annotation_path,frames_per_clip,step_between_clips=1frame_rate=None,fold=1,train=True,transform=None,_precomputed_metadata=Nonenum_workers=1,_video_width=0,_video_height=0,_video_min_dimension=0_audio_samples=0)) data_loader = torch.utils.data.DataLoader(ucf_data, batch_size=4, shuffle=True, num_workers=args.nThreads)
Let’s define the parameters in the UCF101 Class:
· root – It is the root directory of the UCF101 Dataset.
· annotation_path – It contains the split files.
· frames_per_clip – Number of frames in a clip for the UCF dataset.
· step_between_clips – Number of frames between each clip.
· fold – We need to check which fold to use. It Should be between 1 and 3.
· train – Creates a dataset from the train split if the statement is true.
Practical Implementation Using Tensorflow
#Import all the libraries required for this project import tensorflow as tf import tensorflow_hub as hub import random import os import ssl import cv2 import numpy as np import imageio from IPython import display from urllib import request import re import tempfile
Reading the Video dataset
# fetch videos from UCF101 dataset UCF_ROOT = "https://www.crcv.ucf.edu/THUMOS14/UCF101/UCF101/" _VIDEO_LIST = None _CACHE_DIR = tempfile.mkdtemp() # default Colab environment anymore. unverified_context = ssl._create_unverified_context() def list_ucf_videos(): global _VIDEO_LIST if not _VIDEO_LIST: index = request.urlopen(UCF_ROOT, context=unverified_context).read().decode("utf-8") videos = re.findall("(v_[\w_]+\.avi)", index) _VIDEO_LIST = sorted(set(videos)) return list(_VIDEO_LIST) def fetch_ucf_video(video): cache_path = os.path.join(_CACHE_DIR, video) if not os.path.exists(cache_path): urlpath = request.urljoin(UCF_ROOT, video) print("Fetching %s => %s" % (urlpath, cache_path)) data = request.urlopen(urlpath, context=unverified_context).read() open(cache_path, "wb").write(data) return cache_path def crop_center_square(frame): y, x = frame.shape[0:2] min_dim = min(y, x) start_x = (x // 2) - (min_dim // 2) start_y = (y // 2) - (min_dim // 2) return frame[start_y:start_y+min_dim,start_x:start_x+min_dim] def load_video(path, max_frames=0, resize=(224, 224)): cap = cv2.VideoCapture(path) frames = [] try: while True: ret, frame = cap.read() if not ret: break frame = crop_center_square(frame) frame = cv2.resize(frame, resize) frame = frame[:, :, [2, 1, 0]] frames.append(frame) if len(frames) == max_frames: break finally: cap.release() return np.array(frames) / 255.0 def to_gif(images): converted_images = np.clip(images * 255, 0, 255).astype(np.uint8) imageio.mimsave('./animation.gif', converted_images, fps=25) return embed.embed_file('./animation.gif')
Get the list of videos in the dataset
ucf_videos = list_ucf_videos() categories = {} for video in ucf_videos: category = video[2:-12] if category not in categories: categories[category] = [] categories[category].append(video) print("Found %d videos in %d categories." % (len(ucf_videos), len(categories))) for category, sequences in categories.items(): summary = ", ".join(sequences[:2]) print("%-20s %4d videos (%s, ...)" % (category, len(sequences), summary))
Load a sample video
# Get a sample cricket video. video_path = fetch_ucf_video("v_VolleyballSpiking_g01_c01.avi") sample_video = load_video(video_path) i3d = hub.load("https://tfhub.dev/deepmind/i3d-kinetics-400/1").signatures['default']
Prediction on a sample video
def predict(sample_video): # Add a batch axis to the sample video. model_input = tf.constant(sample_video, dtype=tf.float32)[tf.newaxis, ...] logits = i3d(model_input)['default'][0] probabilities = tf.nn.softmax(logits) print("Top 5 actions:") for i in np.argsort(probabilities)[::-1][:5]: print(f" {labels[i]:22}: {probabilities[i] * 100:5.2f}%")
State of the Art
The current state of the art on UCF 101 dataset is R2+1D-BERT. The model gave an accuracy of 98.69. LGD-3D Two-stream and Two Stream I3D performed well on these actionable datasets with accuracy over 98%.
Final Thoughts
In this article, we have presented UCF101 which is one of the most testing dataset for activity acknowledgement contrasted with the current ones. It incorporates 101 activity classes and over 13k clips.The research on the same is still in progress so that there can be a further increase in the accuracy of the model. Hope this article is useful to you.