MITB Banner

ML Model That Can Count Heartbeats And Workout Laps From Videos

Share

Machine learning models are designed to check for patterns in the data they are fed. But, how would they perform when they are asked to check for patterns or repetitions by looking at a video? Researchers have been trying to teach models to not only find patterns in videos but also to count the number of times a certain action is repeated. The implications can range from identifying patterns in traffic cams, heartbeats in ultrasound and many more.

The researchers at Google AI have introduced RepNet, a single model that can recognise repetitions within a single video and understand a broad range of repeating processes, ranging from birds flapping their wings to pendulums swinging.

Overview Of RepNet

RepNet architecture

The model consists of three parts: a frame encoder, an intermediate representation called a temporal self-similarity matrix (TSM), and a period predictor.

First, the frame encoder uses the ResNet architecture as a per-frame model to generate embeddings of each frame of the video. Passing each frame of a video through a ResNet-based encoder yields a sequence of embeddings.

Now, TSM is calculated by comparing each frame’s embedding with every other frame in the video, returning a matrix that is easy for subsequent modules to analyse for counting repetitions.

via GoogleAI 

The popular Transformers networks are then used to predict the period of repetition and the periodicity for each frame directly from the sequence of similarities in the TSM. Now, the per-frame count is obtained by dividing the number of frames captured in a periodic segment by the period length. This is summed up to predict the number of repetitions in the video.

The working of RepNet can be summarised as follows:

  • A video V is taken as a sequence of frames.
  • This video is fed to an image encoder to produce per-frame embeddings X.
  • Then, using the embeddings the self-similarity matrix (TSM) is obtained by computing pairwise similarities between all pairs of embeddings. 
  • This similarity matrix is fed to the period predictor module which gives period length estimate and periodicity score.
  • The period length is the rate at which a repetition is occurring while the periodicity score indicates if the frame is within a periodic portion of the video or not.

For training, the authors propose the use of synthetically generated repetitions using unlabeled videos from YouTube. Synthetic periodic videos are generated using randomly selected videos, and are used to predict per frame periodicity and period lengths.

The researchers have also introduced Countix dataset, a subset of the Kinetics dataset annotated with segments of repeated actions and corresponding counts. During collection, the authors first manually choose a subset of classes from Kinetics which have a higher chance of repetitions happening in them for e.g. jumping jacks, slicing onion etc.

Key Takeaways

via GoogleAI 

The authors in this work are of the notion that repeating processes provide us with unambiguous “action units,” semantically meaningful segments that make up an action. For example, if a person is chopping an onion, the action unit is the manipulation action that is repeated to produce additional slices. These units may be indicative of more complex activity and may allow us to analyse more such actions automatically at a finer time-scale without having a person annotate these units.

Few applications of this model:

  • Monitoring speed changes is useful for exercise tracking applications
  • Predict the count and frequency of repeating phenomena from videos for e.g. biological processes like heartbeats
  • This model successfully detects periodicity and predicts counts over a diverse set of actors (humans, animals etc) and sensors (standard camera, ultrasound etc).

The researchers believe that this work will lead to more complex cases such as multiple simultaneous repeating signals and temporal arrangements of repeating sections such as in dance steps and music.

Link to paper

PS: The story was written using a keyboard.
Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India