Advertisement

Active Hackathon

One Of The Most Benchmarked Human Motion Recognition Dataset In Deep Learning

HMDB-51 is an activity video information dataset with 51 activity classifications, which altogether contain around 7,000 physically clarified cuts separated from an assortment of sources going from digitized motion pictures to YouTube.
hmdb

HMDB-51 is an human motion recognition dataset with 51 activity classifications, which altogether contain around 7,000 physically clarified cuts separated from an assortment of sources going from digitized motion pictures to YouTube.It was developed by the researchers: H. Kuehne, H. Jhuang, E. Garrote and T.Serre in the year 2011. 

The dataset contains 51 particular activity classes, each containing at any rate 101 clips for an aggregate of 6,766 video cuts extricated from a wide scope of sources. The labels for each clip incorporate the camera viewpoint, the video quality, and the number of entertainers engaged with the activity.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The actions classes can be divided into five types: 

1) face actions: laugh, chew, talk 

2) face actions with object manipulation: smoke, eat, drink

3) body movements: clap hands, climb, dive, fall, backhand flip, hand-stand, walk, push up, run 

4) Body movements with object interaction: swing bat, kick football, brush hair, catch, draw sword, play tennis, hit something, kickball, pick, pour, ride bike, lay badminton, shoot ball, throw;

5) body movements for human interaction: hug someone, kick, kiss, punch, shake hands, sword fight.

Here, we will examine data contained in this dataset, how it was gathered, and provide some benchmark models that gave high precision on this dataset. Further, we will implement the HMDB using Pytorch and Keras Library.

Data Collection

To gather human movements that represent regular activities, a group of students were asked to watch video recordings from different web sources like Youtube and Google recordings and clarify any section of these recordings that speaks to a single human activity. They were instructed to consider a minimum quality standard like a single action per clip, at least 60 pixels in tallness for the principle actor, minimum contrast level, least 1 second of clasp length, and adequate pressure artefacts. They used Amazon Mechanical Turkers (AMT) tool to check if the clip contains the activity or not. A few clips may contain common video material. In this way, the dataset was refined by watching that only one clip is taken from each video.

Loading the dataset using Pytorch

The dataset can be downloaded from the following link.

Import all the libraries required for this project.

import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.utils.data import random_split, DataLoader
from torch.optim.lr_scheduler import StepLR
import torchvision
from torchvision import get_video_backend
from torchvision.models.video import r3d_18 
from torchvision import transforms

We need to transform the dataset using data augmentation.It can help to get more information by adding minor changes to our current dataset. For example flips or resize or add brightness to the image. 

data_augm
data = torchvision.transforms.Compose([
                                 T.ToFloatTensorInZeroOne(),
                                 T.Resize((128, 171)),
                                 T.RandomHorizontalFlip(),
                                 T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
                                 T.RandomCrop((112, 112))
                               ])  

Next step is to load the dataset with batch size 32.

hmdb51_training = torchvision.datasets.HMDB51('video_data/', 'test_train_splits/', num_frames,
                                                step_between_clips = clip_steps, fold=1, train=True,
                                                transform=data, num_workers=num_workers)
batch_size=32
data_loader = DataLoader(hmdb51_training, batch_size=batch_size, shuffle=True, **kwargs)

The below result shows the state of the art of recognition results for HMDB-51 dataset.

training_result

Loading the dataset using Keras

Install the video generator using the pip command. Image data generator is used to augment the dataset.

pip install keras-video-generators
import os
import glob
import keras
from keras_video import VideoFrameGenerator

We need to define the parameters that can be passed to the model for training.

classes = [i.split(os.path.sep)[1] for i in glob.glob('videos/*')]
classes.sort()
# Parameters
Size = (112, 112)
channel = 3
Nbframe = 5
Batch_size = 32
# Data augmentation
data_augmentation = keras.preprocessing.image.ImageDataGenerator(
    zoom_range=.1,
    horizontal_flip=True,
    rotation_range=8,
    width_shift_range=.2,
    height_shift_range=.2)

Load the dataset with different parameters.

# Create video frame generator
train = VideoFrameGenerator('data/train/',
    classes=classes, 
    nb_frames=Nbframe,
    split=.33, 
    shuffle=True,
    batch_size=Batch_size,
    target_shape=Size,
    nb_channel=channel,
    transformation=data_aug,
    use_frame_cache=True)

State of the art

The present state of the art on HMDB-51 dataset is R2+1D-BERT. The model gave a precision of 85.10%. HAF+Bow is a nearby contender with a precision of around 83%.

Conclusion

In this article we described a dataset that can be used for human activity recognition.Further we have implemented this dataset with the help of Pytorch and Keras Library.With 51 action classes this HMDB-51 dataset is still a long way from catching the wealth and the full intricacy of video cuts normally found in the motion pictures or online recordings.

More Great AIM Stories

Ankit Das
A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.