To facilitate learning and the efficient usage of cerebral resources, we humans have incorporated the strategy of making connections between abstract concepts. If you have the knowledge of an apple being red, you won’t wonder what a red rose is. These associations though look trivial, they are the product of millions of years of evolution and strategies picked up through various strategies embedded in language.
The visual cue to audio is an effective way to grasp more knowledge about a concept. No wonder, today, machines are being taught to mimic this ability in humans to carry out everyday tasks.
Learning to group objects into concepts is an essential human cognitive process. To facilitate learning, metaconcepts have been developed to describe the abstract relations between concepts.
Learning both concepts and metaconcepts involves categorization at various levels, from concrete visual attributes such as red and cube to abstract relations between concepts, such as synonym and hypernym.
Meta-learning was introduced to make machine learning models to learn new skills and adapt to the ever-changing environments in the presence of finite training precedents.
To bring the combined benefits of learning visual concepts and metaconcepts in machines, researchers from MIT in collaboration with IBM, have published the work titled, “Visual Concept-Metaconcept Learning”, that demonstrates it well.
They also propose the visual concept-metaconcept learner (VCML, see) for joint learning of visual concepts (in this case, red and cube) and metaconcepts (e.g., if 2 different colour names describe the same property of objects).
As shown above, the model learns concepts and metaconcepts from images and two types of questions. The learned knowledge helps to generalize the unseen visual concept compositions, or to concepts with limited visual data and metaconcept generalization between unseen pairs of concept.
Overview Of Metaconcept Learning
The Visual Concept-Metaconcept Learner. The model comprises of three modules:
- A perception module for extracting object-based visual representations,
- A semantic parsing module for recovering latent programs from natural language, and
- A neuro-symbolic reasoning module that executes the program to answer the question.
The perception module object-based representation of the scene is done by using a Mask R-CNN.
Knowledge about metaconcepts empowers visual concept learning from limited, noisy, and even biased data.
Visual representations provide grounding cues for predicting relations between unseen pairs of concepts.
Metaconcepts enable concept learning from limited, noisy, and even biased examples, with generalization to novel compositions of attributes at test time.
To classify whether two concepts (e.g., red and cube) are related by a metaconcept (e.g. synonym), first, several probabilities between them are computed.
Mathematically, they are defined via two helper functions g1 and g2:
g1(a, b) = logit(Pr(a | b)),
g2(a, b) = ln Pr(a, b)/[ Pr(a) Pr(b)]
where logit(·) is the logit function.
These values are then fed into the perceptron to predict the relation
The above illustration is of a concept-metaconcept embedding space. Each concept or object is embedded as a high-dimensional vector, which is associated with a half-space supported by this vector and each metaconcept is associated with a multi-layer perceptron as a classifier.
With this work, the authors have demonstrated the following:
- Showing that the metaconcept synonym enables the model to learn a novel concept without any visual examples (i.e., zero-shot learning).
- Showing how the metaconcept same_kind supports learning from biased visual data.
- Evaluating the performance of few-shot learning with the support of the metaconcept hypernym.
- Providing extra results to demonstrate that metaconcepts can improve the overall data-efficiency of visual concept learning.
Beyond just learning from questions regarding visual scenes (e.g., is there any red cubes?) this model learns from questions about metaconcepts (e.g., do red and yellow describe the same property of objects?)
This model, claims that the authors can successfully categorize objects with new combinations of visual attributes, or objects with attributes with limited training data; it can also predict relations between unseen pairs of concepts. This work presents a systematic evaluation on both synthetic and real-world images, with a focus on learning efficiency and strong generalization.