Advertisement

How This Startup Is Using Deep Learning To Decipher Speech From Lip Movements

Recent advances in computer vision, pattern recognition, and signal processing have led to a budding curiosity in automating the challenging task of lip reading. Visual speech recognition (VSR) has received much attention in the last few decades for its potential use in applications such as human-computer interaction, audio-visual speech recognition, speaker recognition and more. 

One such startup, who is striving hard to make successful development of Visual Speech Recognition (VSR) technology is Belfast-based Liopa.ai. Founded in 2015 by Fabian Campbell-West, Liam McQuillan, Darryl Stewart and Richard McConnell, Liopa is a spin-out from Queen’s University Belfast and the Centre for Secure Information Technologies (CSIT).

For this week’s startup column, Analytics India Magazine got in touch with one of the founders and CEO of Liopa, Liam McQuillan to gain a more in-depth insight into how it drives AI and machine learning to provide visual speech recognition platforms.

AIM: Tell Us A Little About The Company

McQuillan: We were incorporated in November 2015 and are commercialising more than ten years of research in the field of speech and image processing with particular focus on the fusion of speech and lip movements for robust speech recognition in real-world environments. Liopa is a Visual Speech Recognition (VSR) technology developer, which deciphers speech through video by analysing discrete lip movements.

Our VSR technology is the product of over 50 man-years of research that utilises a combination of highly innovative techniques to track and extract speaker lip movements. We have also built an AI-Engine that combines several state-of-the-art modelling techniques and deep neural networks to derive the words spoken by the subject.

AIM: Tell Us About Your Flagship Product

McQuillan: Liopa’s mission is to provide an accurate, easy-to-use and robust Visual Speech Recognition platform, known as LipRead. Liopa is a spin-out from the Centre for Secure Information Technologies (CSIT) at Queen’s University Belfast (QUB). 

Liopa is onward developing and commercialising ten years of research carried out within the university to use lip movements in speech recognition. The company is leveraging QUB’s renowned excellence in speech and dialogue modelling to position itself as a leading independent provider of VSR technology. 

AIM: What Are Your Innovative Ways To Use AI Techniques? 

McQuillan: Liopa is at the forefront of automatic lipreading technology, also known as visual speech recognition (VSR). Liopa uses automatic speech recognition, computer vision and deep learning to build fast and accurate services based on VSR. AI is at the core of who we are and what we do as a company. We use analytics to measure our system performance during use to improve accuracy and latency.

VSR is achieved using a sophisticated processing pipeline that starts with a video of someone speaking and finishes with a transcription of what they said. We use various deep neural network architectures like auto-encoders, LSTM, TDNN, and computer vision processes like illumination compensation, feature detection, visual tracking, 3D spatial compensation, and filtering.

AIM: Tell Us About Liopa’s Core Technology Stack

McQuillan: Liopa’s core technology stack is a cloud-based computer vision pipeline wrapped in an API and accessed via clients. We use a mixture of DNN frameworks and libraries, including TensorFlow and Kaldi. Most of our components are custom-designed and built for our application, using a combination of C++, Java, Python and scripting languages.

Our front-end clients are built using Java (Android) and Swift (iOS). We have SDKs to allow partners to integrate our service into their applications.

The back-end is built on robust open-source foundations like Apache and Nginx. We use cloud infrastructure extensively to achieve global reach and scalability. The server applications are built in Java, C++ and Python depending on the component, the performance requirements and the amount of change we expect.

AIM: How Liopa Is Helping To Fight The Epidemic

McQuillan: Liopa’s SRAVI lipreading application is being trialled in a pilot study with the Lancashire Teaching Hospitals NHS Trust, with patients in ICU at the Royal Preston Hospital. Some of them may have been recovering from COVID-19. 

SRAVI is an easy-to-use communications aid for patients who cannot speak, such as those who have a tracheostomy inserted. These patients can no longer make a sound; however, they can still mouth words. SRAVI enables them to mouth phrases into their mobile phone, and by reading their lip movements, the app communicates their phrases to a doctor, nurse, family or friend.  

AIM: Tell Us About Your Recent Funding

McQuillan: Liopa has recently won funding from Innovate UK in its “Funding Competition for Business-led Innovation in Response to Global Disruption.” The financial award is part of a £40 million package from the UK government to bolster technology and research-focused companies working to build resilience during the COVID crisis.

Liopa is a pre-revenue deep-tech startup funded by several venture capital companies that invest in highly scalable ventures with strong IP in the AI space. The latest funding round will support the release of the company’s first commercial product into the digital health marketplace and allow the company to establish and grow with a strong monthly recurring revenue (MRR) stream.

AIM: What You Look For Hiring Talents?

McQuillan: The Liopa team is a small team but highly experienced in speech recognition, AI and computer vision. The company looks for PhD educated researchers and development engineers who are highly motivated by the massive challenge of taking an entirely new technology to market via groundbreaking products that are creating new markets, e.g. health and security.

AIM: What Does The Future Roadmap Look Like? 

McQuillan: In the next five years, Liopa plans to grow to over fifty employees, release several AI-based innovative visual speech recognition products and establish a growing multi-million dollar MRR.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR