Recent advances in computer vision, pattern recognition, and signal processing have led to a budding curiosity in automating the challenging task of lip reading. Visual speech recognition (VSR) has received much attention in the last few decades for its potential use in applications such as human-computer interaction, audio-visual speech recognition, speaker recognition and more.
One such startup, who is striving hard to make successful development of Visual Speech Recognition (VSR) technology is Belfast-based Liopa.ai. Founded in 2015 by Fabian Campbell-West, Liam McQuillan, Darryl Stewart and Richard McConnell, Liopa is a spin-out from Queen’s University Belfast and the Centre for Secure Information Technologies (CSIT).
For this week’s startup column, Analytics India Magazine got in touch with one of the founders and CEO of Liopa, Liam McQuillan to gain a more in-depth insight into how it drives AI and machine learning to provide visual speech recognition platforms.
AIM: Tell Us A Little About The Company
McQuillan: We were incorporated in November 2015 and are commercialising more than ten years of research in the field of speech and image processing with particular focus on the fusion of speech and lip movements for robust speech recognition in real-world environments. Liopa is a Visual Speech Recognition (VSR) technology developer, which deciphers speech through video by analysing discrete lip movements.
Our VSR technology is the product of over 50 man-years of research that utilises a combination of highly innovative techniques to track and extract speaker lip movements. We have also built an AI-Engine that combines several state-of-the-art modelling techniques and deep neural networks to derive the words spoken by the subject.
AIM: Tell Us About Your Flagship Product
McQuillan: Liopa’s mission is to provide an accurate, easy-to-use and robust Visual Speech Recognition platform, known as LipRead. Liopa is a spin-out from the Centre for Secure Information Technologies (CSIT) at Queen’s University Belfast (QUB).
Liopa is onward developing and commercialising ten years of research carried out within the university to use lip movements in speech recognition. The company is leveraging QUB’s renowned excellence in speech and dialogue modelling to position itself as a leading independent provider of VSR technology.
AIM: What Are Your Innovative Ways To Use AI Techniques?
McQuillan: Liopa is at the forefront of automatic lipreading technology, also known as visual speech recognition (VSR). Liopa uses automatic speech recognition, computer vision and deep learning to build fast and accurate services based on VSR. AI is at the core of who we are and what we do as a company. We use analytics to measure our system performance during use to improve accuracy and latency.
VSR is achieved using a sophisticated processing pipeline that starts with a video of someone speaking and finishes with a transcription of what they said. We use various deep neural network architectures like auto-encoders, LSTM, TDNN, and computer vision processes like illumination compensation, feature detection, visual tracking, 3D spatial compensation, and filtering.
AIM: Tell Us About Liopa’s Core Technology Stack
McQuillan: Liopa’s core technology stack is a cloud-based computer vision pipeline wrapped in an API and accessed via clients. We use a mixture of DNN frameworks and libraries, including TensorFlow and Kaldi. Most of our components are custom-designed and built for our application, using a combination of C++, Java, Python and scripting languages.
Our front-end clients are built using Java (Android) and Swift (iOS). We have SDKs to allow partners to integrate our service into their applications.
The back-end is built on robust open-source foundations like Apache and Nginx. We use cloud infrastructure extensively to achieve global reach and scalability. The server applications are built in Java, C++ and Python depending on the component, the performance requirements and the amount of change we expect.
AIM: How Liopa Is Helping To Fight The Epidemic
McQuillan: Liopa’s SRAVI lipreading application is being trialled in a pilot study with the Lancashire Teaching Hospitals NHS Trust, with patients in ICU at the Royal Preston Hospital. Some of them may have been recovering from COVID-19.
SRAVI is an easy-to-use communications aid for patients who cannot speak, such as those who have a tracheostomy inserted. These patients can no longer make a sound; however, they can still mouth words. SRAVI enables them to mouth phrases into their mobile phone, and by reading their lip movements, the app communicates their phrases to a doctor, nurse, family or friend.
AIM: Tell Us About Your Recent Funding
McQuillan: Liopa has recently won funding from Innovate UK in its “Funding Competition for Business-led Innovation in Response to Global Disruption.” The financial award is part of a £40 million package from the UK government to bolster technology and research-focused companies working to build resilience during the COVID crisis.
Liopa is a pre-revenue deep-tech startup funded by several venture capital companies that invest in highly scalable ventures with strong IP in the AI space. The latest funding round will support the release of the company’s first commercial product into the digital health marketplace and allow the company to establish and grow with a strong monthly recurring revenue (MRR) stream.
AIM: What You Look For Hiring Talents?
McQuillan: The Liopa team is a small team but highly experienced in speech recognition, AI and computer vision. The company looks for PhD educated researchers and development engineers who are highly motivated by the massive challenge of taking an entirely new technology to market via groundbreaking products that are creating new markets, e.g. health and security.
AIM: What Does The Future Roadmap Look Like?
McQuillan: In the next five years, Liopa plans to grow to over fifty employees, release several AI-based innovative visual speech recognition products and establish a growing multi-million dollar MRR.