Advertisement

How To Use UCF101, The Largest Dataset Of Human Actions

UCF-101 dataset has 101 actions and 13320 clips of human actions, collected from youtube were first introduced in 2012 by researchers: Khurram Soomro, Amir Roshan Zamir and Mubarak Shah of Center for Research in Computer Vision, Orlando, FL 32816, USA. The clips in the action class are divided into 25 groups. Each group contains 4-7 clips. Clips in each group share some common features like background or actor.
video_dataset

UCF-101 dataset has 101 actions and 13320 clips of human actions, collected from youtube were first introduced in 2012 by researchers: Khurram Soomro, Amir Roshan Zamir and Mubarak Shah of Center for Research in Computer Vision, Orlando, FL 32816, USA. The clips in the action class are divided into 25 groups. Each group contains 4-7 clips. Clips in each group share some common features like background or actor.

UCF Sports, UCF11, UCF50 and UCF101 are the datasets arranged by UCF in sequential order, each one incorporates its forerunner. UCF-101 is the largest among them with 101 classes. This dataset gives the biggest variety as far as activities and with the presence of huge varieties in camera movement, object appearance and posture, object scale, perspective, jumbled foundation, light conditions.

Here, we will discuss the dataset and see how to load the dataset using TensorFlow and PyTorch. Further, we will implement the UCF-101 dataset in TensorFlow.

About the dataset

The dataset can be downloaded from the following link. It includes web videos which are recorded in various lighting conditions and low-quality frames. 101 human actions classes are divided into 5 types: Human-Object Interaction, HumanHuman Interaction, Playing Musical, Body-Motion Only, Instruments and Sports.

human_actions

Load the dataset using different deep learning frameworks.

Tensorflow

import tensorflow as tf
import tensorflow_datasets as tfds
import math
 x_train = tfds.load('ucf101', split='train', shuffle_files=True, batch_size = 64)

Pytorch

import torch
import torchvision
ucf_data = torchvision.datasets.UCF101(root,annotation_path,frames_per_clip,step_between_clips=1frame_rate=None,fold=1,train=True,transform=None,_precomputed_metadata=Nonenum_workers=1,_video_width=0,_video_height=0,_video_min_dimension=0_audio_samples=0))
data_loader = torch.utils.data.DataLoader(ucf_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

Let’s define the parameters in the UCF101 Class:

·   root  – It is the root directory of the UCF101 Dataset.

·   annotation_path – It contains the split files.

·   frames_per_clip – Number of frames in a clip for the UCF dataset.

·   step_between_clips – Number of frames between each clip.

·   fold – We need to check which fold to use. It Should be between 1 and 3.

·   train – Creates a dataset from the train split if the statement is true.

Practical Implementation Using Tensorflow

#Import all the libraries required for this project
import tensorflow as tf
import tensorflow_hub as hub
import random
import os
import ssl
import cv2
import numpy as np
import imageio
from IPython import display
from urllib import request
import re
import tempfile

Reading the Video dataset

# fetch videos from UCF101 dataset
UCF_ROOT = "https://www.crcv.ucf.edu/THUMOS14/UCF101/UCF101/"
_VIDEO_LIST = None
_CACHE_DIR = tempfile.mkdtemp()
# default Colab environment anymore.
unverified_context = ssl._create_unverified_context()
def list_ucf_videos():
  global _VIDEO_LIST
  if not _VIDEO_LIST:
    index = request.urlopen(UCF_ROOT, context=unverified_context).read().decode("utf-8")
    videos = re.findall("(v_[\w_]+\.avi)", index)
    _VIDEO_LIST = sorted(set(videos))
  return list(_VIDEO_LIST)
def fetch_ucf_video(video):
  cache_path = os.path.join(_CACHE_DIR, video)
  if not os.path.exists(cache_path):
    urlpath = request.urljoin(UCF_ROOT, video)
    print("Fetching %s => %s" % (urlpath, cache_path))
    data = request.urlopen(urlpath, context=unverified_context).read()
    open(cache_path, "wb").write(data)
  return cache_path
def crop_center_square(frame):
  y, x = frame.shape[0:2]
  min_dim = min(y, x)
  start_x = (x // 2) - (min_dim // 2)
  start_y = (y // 2) - (min_dim // 2)
  return frame[start_y:start_y+min_dim,start_x:start_x+min_dim]
def load_video(path, max_frames=0, resize=(224, 224)):
  cap = cv2.VideoCapture(path)
  frames = []
  try:
    while True:
      ret, frame = cap.read()
      if not ret:
        break
      frame = crop_center_square(frame)
      frame = cv2.resize(frame, resize)
      frame = frame[:, :, [2, 1, 0]]
      frames.append(frame)
      if len(frames) == max_frames:
        break
  finally:
    cap.release()
  return np.array(frames) / 255.0
def to_gif(images):
  converted_images = np.clip(images * 255, 0, 255).astype(np.uint8)
  imageio.mimsave('./animation.gif', converted_images, fps=25)
  return embed.embed_file('./animation.gif')

Get the list of videos in the dataset

ucf_videos = list_ucf_videos()
categories = {}
for video in ucf_videos:
  category = video[2:-12]
  if category not in categories:
    categories[category] = []
  categories[category].append(video)
print("Found %d videos in %d categories." % (len(ucf_videos), len(categories)))
for category, sequences in categories.items():
  summary = ", ".join(sequences[:2])
  print("%-20s %4d videos (%s, ...)" % (category, len(sequences), summary))
Video_categories

Load a sample video

# Get a sample cricket video.
video_path = fetch_ucf_video("v_VolleyballSpiking_g01_c01.avi")
sample_video = load_video(video_path)
i3d = hub.load("https://tfhub.dev/deepmind/i3d-kinetics-400/1").signatures['default']

Prediction on a sample video

def predict(sample_video):
  # Add a batch axis to the sample video.
  model_input = tf.constant(sample_video, dtype=tf.float32)[tf.newaxis, ...]
  logits = i3d(model_input)['default'][0]
  probabilities = tf.nn.softmax(logits)
  print("Top 5 actions:")
  for i in np.argsort(probabilities)[::-1][:5]:
    print(f"  {labels[i]:22}: {probabilities[i] * 100:5.2f}%")
Top_5_actions_UCF-101

State of the Art

The current state of the art on UCF 101 dataset is R2+1D-BERT. The model gave an accuracy of 98.69. LGD-3D Two-stream and Two Stream I3D performed well on these actionable datasets with accuracy over 98%.

Final Thoughts

In this article, we have presented UCF101 which is one of the most testing dataset for activity acknowledgement contrasted with the current ones. It incorporates 101 activity classes and over 13k clips.The research on the same is still in progress so that there can be a further increase in the accuracy of the model. Hope this article is useful to you.

Download our Mobile App

Ankit Das
A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR