Just like last year, television sets may be the cynosure of CES 2020, considering the expensive research that goes into making TV displays thinner, gigantic, and flexible enough to roll away or even disappear. In fact, even Steve Jobs had a vision for television before he died.
Jonathan Zittrain, professor of computer science at Harvard, wrote about how PC and Internet are two examples of generative technologies that adhered to open standards and allowed for innovation in unforeseen directions. They also equally enabled a range of people from hobbyists, programmers and tech companies to tinker with them. But this was not the case with TVs, which not only was dubbed an idiot box but was also locked down by proprietary networks and cables. Even with the advent of “smart TVs”, paid apps and dongles still controlled the streaming content.
But now, “TV with voice input” has the potential to be generative for the first time.
Voice Input For Television
Voice assistants like Alexa, Google Assistant and Siri are a hit with maps, have significantly freed the hands and eyes of users, while driving vehicles and are not anymore frustrating with accent issues in natural language translation. Just over a year ago, at the CES 2019, Google has shown that its assistant can translate 27 languages and also add voice commands to maps to show on car displays.
With regards to TVs, beyond just TV viewing and web browsing, the voice assistants have enabled conversational AI technology for a range of use cases such as video conferencing, placing an online order, booking a cab or hotel, playing music, controlling home appliances from dimming bulbs to ACs, switching to live camera feeds, flight check-ins and more.
But do they make our homes smarter? Not really, until you see the possibilities in a traditional pain point i.e. senior care and their quality of living. Each of the use cases mentioned here can transform their seniors’ quality of living, especially if they are immobile or unable to see or type better, due to dimming faculties.
Smart Assistants Vs Voice Input Enabled TVs
Programming with a voice assistant is similar to that of a mobile app, via a set of voice added interactions (or conversations) with the device instead of a touchscreen. Designing the conversation is tantamount to extending the voice platform provided by Amazon, Google, Apple, Microsoft and others with custom interactions by voice UI developers. The build process would map the user intent with the voice application through a combination of skills, slots and utterances for Alexa, or a combination of actions, agents and apps, as may be the case for Google Assistant.
Spoken Language Technology (SLT) or Conversational AI is a mix of deep averaging networks for topic modelling, attentional and bidirectional LSTM to recognise speaker utterance and conversational context. To explain briefly, topic modelling networks help in finding hidden structure in text. Attentional mechanisms in neural networks have the ability to focus on a subset of its inputs. Bi-directional LSTMs aims to train two LSTMs, one on input sequence and the second on a reversed copy of the input sequence. Thus, an averaging ensemble of all these would markedly improve model performance on sequence classification problems.
Behind all these, functions a Natural Language Understanding (NLU) system (eg., DialogFlow) that extracts the context and meaning of the spoken words. As the NLU system matures over time, the voice assistants will learn better and further to recognise the user’s input style, needs and habits on a daily basis. To give an example, last year in October, Google updated its search with BERT, an NLU algorithm with an ability to train language models based on the entire set of words within a sentence or query, allowing those language models to learn word context based on surrounding words and not just the words that precede or follow.
As the voice assistants are programmable and AI-enabled with conversational AI, they can further “voice-enable” the television, and can revolutionise a range of healthcare use cases such as video conferencing with their doctors. You can support voice commands as well as touch and remote controls, when available, and take advantage of automatic entity resolution for voice-based selection of on-screen elements.
Beyond that, targeted use cases for assisted living and health care such as the CES topics of discussion, such as “the surging currency of voice in health care” and “the explosion in health data, revolutionary sensors, 5G and artificial intelligence” has the potential to redefine monitoring and even e-diagnosis prior to timely intervention.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Meenakshi Sambamurthy is a part of the AIM Writers Programme. She is a cloud and AI expert based out of Bangalore, India.