Last month, Elon Musk’s Neuralink demonstrated that it is possible to monitor brain activity from our phones. There were speculations around Neuralink of what potential it has for the future generations.
Decoding brain signals has great implications in medicine. A disabled person can be assisted, can understand what a speechless person is feeling and more. So, can we know what someone is thinking? If we can, will that output be in a text format? Investigating on similar lines, researchers from ETH Zurich, Switzerland, have proposed a new data-driven model that directly classifies an fMRI scan and maps it to the corresponding word within a fixed vocabulary.
Furthermore, they address brain decoding on unseen subjects. This new model, wrote the researchers, leverages deep learning to decode brain activities in the form of fMRI scans into text.
Overview Of Brain2Word Method
In this work, the researchers try to map brain activities in the form of fMRI scans to the text presented to subjects during scanning. They considered two types of decoders — classical regression-based decoders and classification-based decoders, which learn to map brain activities to a word within a bounded vocabulary. Shown above is the architecture of the improved decoder applied.
But collecting fMRI is not a straightforward process. There are inconsistencies between scan sessions, and it can be expensive and slow. The authors lament that it takes approximately 4 hours just to obtain the 180 brain scans, which still have to be processed. Most previous work on brain decoding has considered the scenario where the model is trained with data from the same subject that is being evaluated. The authors argue that this scenario is not suitable for real-life applications and that it in fact, limits our ability to decode brain activities.
In the classification-based decoder, the regression layer of size 300 × 1 is turned into a non-linear layer, and an additional softmax layer is added on top of it.
The working of the model looks like this:
- One dimensional fMRI scan of size 65, 730 × 1 voxels and generates a latent vector of size 200 × 1.
- The latent vector is used to produce either the regression or the classification target.
- The model consists of two non-linear fully connected layers that produce feature maps. Each non-linear layer has 0.4 dropout, batch normalization and Leaky ReLU activation (α = 0.3). This simple model is taken as a base model.
- The model is then turned into an autoencoder (decoder-encoder) by adding an encoder that mirrors the base model, i.e., the decoder.
- This encoder reconstructs the input brain activities (fMRI) from the latent vector and a reconstruction term is added to the loss function.
- fMRI classification decoder is then used to transform a brain scan into a probability vector over a vocabulary of 180 words. Top 5 predictions are selected and used as embeddings and anchor points for the language generation model.
- The GPT-2 model then generates text.
According to the authors, the experiments showed that the output of fMRI decoding could guide language generation with great fluency. That said, the authors also admit that there needs improvement in fMRI-to-word decoding.
For instance, to account for the delay in fMRI scans due to blood flow, it would be desirable to have a measure of certainty for the decoded word which triggers language generation when the decoder is certain and halts it otherwise. Also, it would be necessary to record expressions such as ”positive”, ”negative”, ”happiness”, ”nature”, etc.
The contributions of this work by ETH Zurich researchers can be summarised as follows:
- Introduced a model to decode fMRI scans into words that outperforms existing models by a big margin.
- The model successfully generalises to unseen subjects.
- Introduced a strategy for conditioning language generation towards the semantic content of fMRI scans
- Can lead to a real system for translating brain activities to coherent text.
Read the original paper here.