Recently, a team of researchers from Facebook AI and Tel Aviv University proposed an AI system that solves the multiple-choice intelligence test, Raven’s Progressive Matrices. The proposed AI system is a neural network model that combines multiple advances in generative models, including employing multiple pathways through the same network.
Raven’s Progressive Matrices, also known as Raven’s Matrices, are multiple-choice intelligence tests. The test is used to measure abstract reasoning and is regarded as a non-verbal estimate of fluid intelligence.
In this test, a person tries to finish the missing location in a 3X3 grid of abstract images. According to the researchers, there have been various similar researches, where the main focus entirely on choosing the right answer out of the various choices. However, in this research, the researchers focussed on generating a correct answer given the grid, without seeing the choices.
Behind the Model
As mentioned above, the neural network model is a combination of various advances in generative models, including employing multiple pathways within the same network. It uses the ‘reparameterisation’ trick along two pathways in order to make their encoding compatible, which are a dynamic application of variational losses and a complex perceptual loss which is linked with a selective backpropagation procedure.
In this research, the researchers considered the task of generating a correct answer to a Raven Progressive Matrix (RPM) type of intelligence test. In this test, each query includes eight images that are placed on a 3X3 grid size. The task of this test is to create the missing 9th image, such that it matches the patterns of the rows and columns of the grid.
The neural net model recognises the correct answer out of the eight possible choices by encoding each image and aggregating these encodings along rows and columns. The architecture of the AI model is mainly composed of three different pathways:
- Reconstruction: The reconstruction pathway provides supervision that is more accessible to the network when starting to train.
- Recognition: The recognition pathway shapes the representation in a way that makes the semantic information more explicit.
- Generation: The generation pathway relies on the embedding of the visual representation from the first task, and on the semantic embedding obtained with the assistance of the second, and maps the semantic representation of a given query to an image.
How It Works
There are four key components in this method. They are:
- An encoder (E)
- A generator (G) that is trained together as a variational autoencoder (VAE) on the images.
- A Context Embedding Network (CEN), which encodes the context images and produces the embedding for the generated answer
- Finally, a discriminator (D), which provides an adversarial training signal for the generator
The VAE pathway includes the encoder and the generator and it autoencoders the choice images as one image at a time. The CEN is composed of multiple sub-modules, which are trained together to provide input to the generator.
Wrapping Up
The researchers stated, “In problems in which the solution space is complex enough, the ability to generate a correct answer is the ultimate test of understanding the question since one cannot extract hints from any of the potential answers. Our work has been the first to address this task in the context of RPMs.”
The success behind the model includes a number of crucial technologies, including applying the reparameterisation trick selectively and multiple times, reusing the same networks for encoding and to provide a loss signal, selective backpropagation, and an adaptive variational loss.
According to them, the machine learning algorithm is not only capable of generating a set of possible answers but also to be competitive to the state-of-the-art methods in multiple-choice tests. They also claimed that the neural net model could be used to develop an automatic tutoring system that adjusts to the proficiencies of each student.
Read the paper here.