Researchers from MIT Media Lab, the University of California at Santa Barbara and Osaka University, have compiled an open-source, easy-to-use character generation pipeline. It combines AI models for facial gestures, motion, and voice that can be used to create a variety of audio and video outputs.
To distinguish it from the authentic video content, the pipeline marks the resulting output with a traceable watermark to help prevent its malicious use.
GANs are a combination of two neural networks that compete against each other. They have made it easier to create photorealistic images, animate faces and clone voices.
The researchers explored its possibilities in a project called Machinoia, where they generated multiple alternative representations as a child, as an old man, as a female — to have a self-dialogue of life choices from different perspectives.
Such characters can make students enthusiastic about learning and improve cognitive task performance. In this way, the technology offers personalised instruction as per the interest, context, and even by idols that can be changed over time.
“It will be a strange world indeed when AIs and humans begin to share identities. This paper does an incredible job of thought leadership, mapping out the space of what is possible with AI-generated characters in domains ranging from education to health to close relationships while giving a tangible roadmap on how to avoid the ethical challenges around privacy and misrepresentation,” said Jeremy Bailenson, Founding Director of the Stanford Virtual Human Interaction Lab.
Applications might include characters to help deliver therapy and to alleviate the shortage of mental health professionals. Even AI-generated content can deliver exposure therapy to people with social anxiety. The technology can also be used to anonymise faces in the video and still preserve facial expressions and emotions. This can prove to be useful for sessions where people want to share sensitive personal information or for whistleblowers and witness accounts.