Listen to this story
|
Recently, Deepmind researchers announced the launch of Transframer—a new general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. This new model unifies a broad range of tasks, including image segmentation, view synthesis and video interpolation.
This latest framework uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features.
What does Transframer do
Developed by Deepmind, Transframer unifies a range of image modelling and vision tasks and has the ability to create videos or image features based on a single image with one or more context frames.
Transframer works on a variety of video generation benchmarks. The research team claims that it is a state-of-the-art model which is expected to be the strongest and most competitive on few-shot view synthesis, and can generate coherent 30-second videos from a single image.
The proposed model also showed promising results on eight tasks in total, some of which are semantic segmentation, image classification, and optical flow prediction with no task-specific architectural components.
Transframer can also be used in various applications that require learning conditional structure using text or a single image, and will be able to predict and generate video models, novel view synthesis and multi-task vision.
Backed by Google, Deepmind has been researching in the field of AI since 2010 and focusing on building computer models that can solve building and generative problems on their own.