Recently, one of the world’s leading experts in artificial intelligence and a pioneer in deep learning, Yoshua Bengio, along with his team proposed the World Scope (WS). World Scope is regarded as a lens through which the researchers tried to view the progress of natural language processing (NLP) that have been made so far.
Natural language processing has made some of the breakthroughs such as new representational theories, modelling techniques, data collection paradigms, among others in these few years. This is only possible due to the improvements in hardware and the immense data collection.
The motive behind this research is to create a frame that will serve as a roadmap for truly contextual language understanding, provide a birds-eye view of the direction of NLP as well as on the relationship of languages to broader AI, cognitive science and linguistics communities. For this, the researchers considered work on the contextual foundations of language, which are grounding, embodiment, and social interaction.
Behind World Scope
In this research, five levels of World Scopes have been proposed that are regarded as
- WS1: Corpus
- WS2: Internet
- WS3: Perception
- WS4: Embodiment
- WS5: Social
WS1: Corpora and Representations
In this level, the previous works in NLP have been taken into consideration. The researchers mentioned a number of previous studies that rely on corpora. According to them, the most suited instance for this level is the Penn Treebank, which is a sterilised subset of the naturally generated language, processed and annotated for the purpose of studying representations.
WS2: The Written World
The most current work in NLP is represented in the second level of World Scope. After corpora, the use of unstructured, unlabeled, multi-domain, and multilingual data broadens this level. The researchers stated, “We are no longer constrained to a single author or source, and the temptation for NLP is to believe everything that needs knowing can be learned from the written world.”
This, in result, has taken steps towards substantial advances in performance on existing and novel community benchmarks. These advances have been achieved due to the transfer learning enabled by representations in deep models. According to the researchers, the current models are the next step in modelling lexical distributions. The modern approaches to learning dense representations also allow us to better estimate these distributions from massive corpora.
WS3: The World of Sights and Sounds
The third level of World Scope regards the process of learning natural languages through sights and sounds. According to the researchers, learning languages need perception that includes auditory, tactile, and visual input. While auditory input is necessary for understanding sarcasm, stress, and meaning implied through prosody, tactile senses provide meaning in both physical and abstract to concepts like heavy and soft. The visual perception is a signal for modelling a vastness of experiences in the world that cannot be documented by text alone.
WS4: Embodiment and Action
In the fourth level, the researchers talked about embodiment and action. According to the researchers, an embodied agent must translate from language to control and action in any form of the world including, real, virtual, grid world or vision-and-language navigation.
The researchers said, “Control is where people first learn abstraction and simple examples of post-conditions through trial and error.” In addition to learning the basic physical properties of the world from interaction, WS4 also allows the agent to construct rich pre-linguistic representations.
WS5: The Social World
The fifth level of WS is to enable interpersonal communication. According to the researchers, interpersonal communication in service of real-world cooperation is the prototypical use of language, and the ability to facilitate such cooperation remains the final test of a learned agent.
The researchers said, “Understanding that what one says can change what people do allows language to take on its most active role. This is the ultimate goal for natural language generation: a language that does something to the world.”
How It Helps
According to the researchers, despite the huge effectiveness of language processing models trained on text, these systems still show faults and errors that arise from a failure to relate language to the physical world it describes and to the social interactions it facilitates.
NLP researchers have consistently recognised the limitations of corpora in terms of coverage of language and experience, which is why researchers extracted the previous works in NLP, cognitive science, and linguistics, in order to provide a roadmap towards addressing these gaps.
The researchers said, “We posit that the universes of knowledge and experience available to NLP models can be defined by successively larger world scopes: from a single corpus to a fully embodied and social context.”
According to the researchers, the proposed World Scopes are steep steps, and there is a possibility that WS5 is considered as AI-complete. WS1, WS2, WS3, and WS4 lend extra depth to the interpretation of language through context, because they expand the factorisations of information available to define meaning and WS5 implies persistent agent experiencing time as well as a personalised set of experiences.
Read the paper here.