Listen to this story
Steven Speilberg’s 2002 sci-fi hit Minority Report showed a ‘pre-crime police department’ that would prevent future crimes before it was committed thanks to three clairvoyant humans (precogs). This was a concept beyond imagination. The precogs were genetically engineered to foresee future crime, which the police could see via a video projection from their minds.
We may be far from getting there yet, but we are surely advancing on the path of generating images from the human brain.
Two researchers in Japan, Yu Takagi and Shinji Nishimoto, recently, submitted a paper where diffusion models (DMs) like Stable Diffusion were used to generate high-resolution images from human brain activity. A study was proposed where images are reconstructed with the use of fMRI (functional magnetic resonance imaging). The goal was to interpret the connection between computer vision models and our visual system. By reconstructing visual experiences from human brain activities, the way a human brain processes visual information can be ascertained.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
fMRI, used for image reconstruction, measures brain activity by detecting changes associated with blood flow. The technique combines cerebral blood flow and neuronal activation. In the proposed paper, high-resolution images were reconstructed with high fidelity without any additional training or fine-tuning of complex deep-learning models.
Each component of the LDM is mapped to specific components of the brain regions.
There have been previous attempts to reconstruct visual images from fMRI, however, newer studies use deep generative models trained on a large number of naturalistic images. There is a limitation to these methods though. Training and fine-tuning of generative models such as GANs (Generative Adversarial Networks), a type of neural network architecture, with the dataset used in fMRI experiments is challenging as the sample size in neuroscience is small. However, DMs and LDMs (latent diffusion models) have the ability to generate high-resolution images with high semantic fidelity of text conditioning, and high computational efficiency.
Latent Diffusion Model
LDM is a type of computer program that can learn to create images by transforming a simple noise pattern into a complex image. LDM can be trained with a dataset of images from which it learns to create new images that look similar to the training data. Once trained, the model will be able to create images by starting with a random noise pattern and gradually transforming it into an image that looks like it belongs in the dataset.
In the proposed paper, each component of an LDM (Stable Diffusion) is quantitatively interpreted from a neuroscience perspective by mapping specific components to distinct brain regions.
Image Source: sites.google
The encoder-decoder model is used where one neural network (the encoder) is used to transform the input data into a fixed-length representation and then use another neural network (the decoder) to generate the output based on this encoding.
Image Source: biorxiv.org
Row 1: Presented images. Row 2 : Images reconstructed from fMRI signals
The research also worked with prediction accuracy of the encoding models for three types of latent representations associated with the diffusion model. Latent representations are compressed, abstract features or variables of the data that capture the most relevant and useful information inferred from raw data, for a particular task. A latent representation of the original image- z, a latent representation of image text annotation- c, and zc which is a noise-added latent representation of z after the reverse diffusion process with cross-attention to c.
Image Source: biorxiv.org
With the announcement of this paper, people have been quick to react with the consideration of this model becoming the next mind reader. However, this model is not trained to interpret thoughts and words. The model is an AI extension on previous studies of brain mapping through fMRI or electroencephalography (EEG), where the imaging machine is able to detect only broad patterns of activity. The proposed model is still in the nascent stages of interpreting brain activities.
Brain mapping is already implemented in the medical sector in diagnosing and understanding a patients’ illnesses pertaining to triggers and tumors. With focussed brain readings, doctors are able to deliver targeted treatments. With image reconstruction from brain activities using LDM, the integration with an already existing framework of brain mapping, can bring advancements in the medical field.
If the proposed model comes into the picture, future refinement of the model can probably assist with jobs where such a model can be extensively implemented. For example, in crime, an eyewitness testimony is influenced by the mental state and surroundings of the witness, which can often cloud the description of the suspect. With this model, eyewitness or victim’s recollection of the suspect will become simpler. However, the implementation of such a technology will bring the focus on ethical mind reading.