In 1954, with the success of the Georgetown experiment in which the scientists used a machine to translate random sentences from Russian to English, the field of computational linguistics took giant strides towards building an intelligent machine capable of recognising and translating speech. These models were even used in translations during the Nuremberg trials.
Nonetheless, the future of machine translation was nowhere close to the forecast due to sluggish computational devices and scarcity of data to train on.
Today, after six decades, machines have transitioned from mechanical statistical models to electronic neural models which can perform complicated tasks like speech recognition and sentiment analysis with great accuracy.
Today machines with NLP have the capability to analyse a wide variety of data (documents, text, voice) and have a range of practical uses, the discipline encompasses multiple capabilities. These range from sentiment analysis to text categorization. NLP can scour documents and classify them by topic, even without a programmer defining in advance which topics to look for.
The field of NLP is innovating every other day thanks to the constant effort of tech giants like Google, Microsoft, Facebook and Amazon. One thing common with these tech giants is their willingness to open source their innovations. Their belief in accelerated innovation through transparency has started to see fruition in the form of diversified real world applications from homepods to chatbots.
Here’s a look at NLP roadmap of these mega-firms:
Open-domain question answering (QA) is a benchmark task in natural language understanding (NLU).
Google’s AI researchers recently released a paper introducing Natural Questions (NQ), a new dataset for QA research, along with methods for QA system evaluation.
In contrast to tasks where it is relatively easy to gather naturally occurring examples, the definition of a suitable QA task, and the development of a methodology for annotation and evaluation is challenging.
Also, there is the Bidirectional Encoder Representations from Transformers or BERT, which was open sourced last year, offers a new ground to embattle the intricacies involved in understanding the language models.
Pre-training a binarised prediction model helps understanding common NLP tasks like Question Answering or Natural language Inference.
Along with innovation in its own backyard, Google is also backing NLP startups like Armorblox. A cybersecurity startup, Armorblox aims to tackle data leaks via online attacks like email spear phishing.
Armorblox developed an NLP engine that derives insights from enterprise communications and data. It offers policy recommendations by learning over time what’s mission-critical for a given organization, and in the event of a potential or attempted breach, it automatically sends alerts to the relevant people and teams.
By applying NLU, Armorblox is able to address a whole new layer of security that has been inaccessible to other security solutions: the content and context of communications. This has been the biggest challenge and attack vector because hackers know that they can exploit this weakness.
Engineers developing NLP algorithms often turn to deep-learning systems to build their solutions, such as Facebook's PyTorch platform.
Facebook AI Research is open-sourcing PyText, a natural-language-processing (NLP) modelling framework that is used in the Portal video-calling device and M Suggestions in Facebook Messenger.
PyText builds on top of PyTorch by providing a set of interfaces and models specifically tuned for NLP. Internally, Facebook is using PyText to power NLP in their Portal video-calling device.
PyText addresses a common problem for NLP projects: the tradeoff between rapid experimentation and scalability in production. Researchers experiment with new ideas, rapidly tweaking models to achieve performance goals.
PyText can utilize multiple GPUs for distributed training and can train multiple models at once, reducing the overall training time. The PyText code also comes with pre-trained models for several common NLP tasks, including text classification, named-entity recognition, and joint intent-determination and slot-filling, which is a staple of chatbot development.
Microsoft’s NLP group focuses on developing efficient algorithms to process text and to make their information accessible to computer applications.
This group addresses natural language problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input.
For example, the grammar checkers in Microsoft Office for English, French, German, and Spanish are some of the byproduct of Microsoft’s NLP enhancements.
- Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity linking.
- Language Understanding (LUIS): A machine learning-based service to build natural language into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve.
Designed to identify valuable information in conversations, LUIS interprets user goals (intents) and distils valuable information from sentences (entities), for a high quality, nuanced language model. LUIS integrates seamlessly with the Azure Bot Service, making it easy to create a sophisticated bot.
Under the hood of LUIS:
"query": "Book me a flight to Cairo",
Microsoft also had presented a novel, fully data-driven, and knowledge-grounded neural conversation model last year, aimed at producing more contentful responses.
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. No machine learning experience required.
The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech, and automatically organizes a collection of text files by topic.
Amazon Comprehend Medical, a variant of Comprehend, identifies the relationship among the extracted medication and test, treatment and procedure information for easier analysis. For example, the service identifies a particular dosage, strength, and frequency related to a specific medication from unstructured clinical notes.
Amazon too has backed NLP startups like its contemporaries. US-based mobile marketing firm Vibes has launched a tool called Conversational Analytics, which uses natural language processing (NLP) to extract consumer insights from unstructured messaging content.
Powered by Amazon Comprehend, this new tool uses machine learning to find insights and relationships in the messaging content exchanged between brands and consumers, and provide a more comprehensive understanding of the purchasing journey.
The Potency Of NLP
The potential for governments to employ these advancements in language processing defies limitation. Governments collect massive numbers of unstructured records. Fragmented information presented in deeply non-mathematical formats, at volumes far too huge for human assessment, are now available for the kind of deep-pattern analysis originally reserved for numerical databases. NLP can open the public sector to new insights, better-tailored services, and faster responses to information.
According to an online financial portal, the artificial intelligence in supply chain market alone is expected to reach USD 10,110.2 million by 2025 from USD 730.6 billion in 2018, at a CAGR of 45.55% during the forecast period.
Market growth can be attributed to the increasing adoption of deep learning and NLP technologies for automotive, retail, and manufacturing applications in APAC.
Some of the key players in this space are Intel (US), NVIDIA (US), Xilinx (US), Samsung Electronics (South Korea), Micron Technology (US), IBM (US), Microsoft (US), and Amazon (US).
The biggest challenge for NLP models, however, has been the lack of training data.
Small training sets restrict many NLP models from performing real-time rendering of both contextual and free from context tasks.
The next big challenge for these NLP models is to reach a human-level understanding of language which has been in the pursuit since the times of Leibniz and Descartes.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad