CEO Mark Zuckerberg announced Project CAIRaoke, a fully end-to-end neural model for building on-device assistants, at Meta’s first virtual event since the rebranding.
We’re excited to share details on Project CAIRaoke, a breakthrough in conversational AI. With this end-to-end system, we’ll be able to have much more personal, contextual conversations than we can with the systems people are familiar with today, said Meta AI.
Alborz Geramifard, a senior research manager at Meta AI, expanded on how the company is taking conversational AI to a different level. “At Meta AI, we’re working on a system that could be personalised, embedded, embodied, and can interact with you in a contextual multimodal fashion. That way your interactions are as frictionless as possible,” he said.
In the future, the assistant may even follow the person in the metaverse. But currently, the focus is on voice-only interactions.
“It can see what you see from your first-person perspective, hear what you hear and, most importantly, understand the context of the situations you are in,” he said.
Despite advancements in natural language understanding, the supercharged assistants have yet to become a reality. The Meta AI team has combined the modules from natural language understanding to the natural language generation in a single model. For building the contextual aspect, Meta is relying on the AI model called BART.
Meta is using the model in its video-calling Portal device. “We plan to augment the project CAIRaoke model to handle multilingual and then multimodal input and outputs as we hope its single model architecture allows for a scooter upgrade process,” said Geramifard.
Meta demonstrated the potential of Project CAIRaoke in a cooking video.
Yes, it sounded kind of creepy. Some of the examples in the blog post about Project CAIRaoke include an assistant helping you pick out a shirt that will go with your pants. The assistant will know your favorite color is red.
— Queenie Wong (@QWongSJ) February 23, 2022
Meta has plans to integrate it in augmented and virtual reality devices to enable even more immersive and multimodal interactions.