Last updated December 28, 2020
In AI Mysteries

How To Use UCF101, The Largest Dataset Of Human Actions

UCF-101 dataset has 101 actions and 13320 clips of human actions, collected from youtube were first introduced in 2012 by researchers: Khurram Soomro, Amir Roshan Zamir and Mubarak Shah of Center for Research in Computer Vision, Orlando, FL 32816, USA. The clips in the action class are divided into 25 groups. Each group contains 4-7 clips. Clips in each group share some common features like background or actor.

Share

Published on November 10, 2020

by Ankit Das

UCF Sports, UCF11, UCF50 and UCF101 are the datasets arranged by UCF in sequential order, each one incorporates its forerunner. UCF-101 is the largest among them with 101 classes. This dataset gives the biggest variety as far as activities and with the presence of huge varieties in camera movement, object appearance and posture, object scale, perspective, jumbled foundation, light conditions.

Here, we will discuss the dataset and see how to load the dataset using TensorFlow and PyTorch. Further, we will implement the UCF-101 dataset in TensorFlow.

About the dataset

The dataset can be downloaded from the following link. It includes web videos which are recorded in various lighting conditions and low-quality frames. 101 human actions classes are divided into 5 types: Human-Object Interaction, HumanHuman Interaction, Playing Musical, Body-Motion Only, Instruments and Sports.

Load the dataset using different deep learning frameworks.

Tensorflow

import tensorflow as tf
import tensorflow_datasets as tfds
import math
 x_train = tfds.load('ucf101', split='train', shuffle_files=True, batch_size = 64)

Pytorch

import torch
import torchvision
ucf_data = torchvision.datasets.UCF101(root,annotation_path,frames_per_clip,step_between_clips=1frame_rate=None,fold=1,train=True,transform=None,_precomputed_metadata=Nonenum_workers=1,_video_width=0,_video_height=0,_video_min_dimension=0_audio_samples=0))
data_loader = torch.utils.data.DataLoader(ucf_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

Let’s define the parameters in the UCF101 Class:

· root – It is the root directory of the UCF101 Dataset.

· annotation_path – It contains the split files.

· frames_per_clip – Number of frames in a clip for the UCF dataset.

· step_between_clips – Number of frames between each clip.

· fold – We need to check which fold to use. It Should be between 1 and 3.

· train – Creates a dataset from the train split if the statement is true.

Practical Implementation Using Tensorflow

#Import all the libraries required for this project
import tensorflow as tf
import tensorflow_hub as hub
import random
import os
import ssl
import cv2
import numpy as np
import imageio
from IPython import display
from urllib import request
import re
import tempfile

Reading the Video dataset

# fetch videos from UCF101 dataset
UCF_ROOT = "https://www.crcv.ucf.edu/THUMOS14/UCF101/UCF101/"
_VIDEO_LIST = None
_CACHE_DIR = tempfile.mkdtemp()
# default Colab environment anymore.
unverified_context = ssl._create_unverified_context()
def list_ucf_videos():
  global _VIDEO_LIST
  if not _VIDEO_LIST:
    index = request.urlopen(UCF_ROOT, context=unverified_context).read().decode("utf-8")
    videos = re.findall("(v_[\w_]+\.avi)", index)
    _VIDEO_LIST = sorted(set(videos))
  return list(_VIDEO_LIST)
def fetch_ucf_video(video):
  cache_path = os.path.join(_CACHE_DIR, video)
  if not os.path.exists(cache_path):
    urlpath = request.urljoin(UCF_ROOT, video)
    print("Fetching %s => %s" % (urlpath, cache_path))
    data = request.urlopen(urlpath, context=unverified_context).read()
    open(cache_path, "wb").write(data)
  return cache_path
def crop_center_square(frame):
  y, x = frame.shape[0:2]
  min_dim = min(y, x)
  start_x = (x // 2) - (min_dim // 2)
  start_y = (y // 2) - (min_dim // 2)
  return frame[start_y:start_y+min_dim,start_x:start_x+min_dim]
def load_video(path, max_frames=0, resize=(224, 224)):
  cap = cv2.VideoCapture(path)
  frames = []
  try:
    while True:
      ret, frame = cap.read()
      if not ret:
        break
      frame = crop_center_square(frame)
      frame = cv2.resize(frame, resize)
      frame = frame[:, :, [2, 1, 0]]
      frames.append(frame)
      if len(frames) == max_frames:
        break
  finally:
    cap.release()
  return np.array(frames) / 255.0
def to_gif(images):
  converted_images = np.clip(images * 255, 0, 255).astype(np.uint8)
  imageio.mimsave('./animation.gif', converted_images, fps=25)
  return embed.embed_file('./animation.gif')

Get the list of videos in the dataset

ucf_videos = list_ucf_videos()
categories = {}
for video in ucf_videos:
  category = video[2:-12]
  if category not in categories:
    categories[category] = []
  categories[category].append(video)
print("Found %d videos in %d categories." % (len(ucf_videos), len(categories)))
for category, sequences in categories.items():
  summary = ", ".join(sequences[:2])
  print("%-20s %4d videos (%s, ...)" % (category, len(sequences), summary))

Load a sample video

# Get a sample cricket video.
video_path = fetch_ucf_video("v_VolleyballSpiking_g01_c01.avi")
sample_video = load_video(video_path)
i3d = hub.load("https://tfhub.dev/deepmind/i3d-kinetics-400/1").signatures['default']

Prediction on a sample video

def predict(sample_video):
  # Add a batch axis to the sample video.
  model_input = tf.constant(sample_video, dtype=tf.float32)[tf.newaxis, ...]
  logits = i3d(model_input)['default'][0]
  probabilities = tf.nn.softmax(logits)
  print("Top 5 actions:")
  for i in np.argsort(probabilities)[::-1][:5]:
    print(f"  {labels[i]:22}: {probabilities[i] * 100:5.2f}%")

State of the Art

The current state of the art on UCF 101 dataset is R2+1D-BERT. The model gave an accuracy of 98.69. LGD-3D Two-stream and Two Stream I3D performed well on these actionable datasets with accuracy over 98%.

Final Thoughts

In this article, we have presented UCF101 which is one of the most testing dataset for activity acknowledgement contrasted with the current ones. It incorporates 101 activity classes and over 13k clips.The research on the same is still in progress so that there can be a further increase in the accuracy of the model. Hope this article is useful to you.

Access all our open Survey & Awards Nomination forms in one place

Ankit Das

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.