The human race is the most complicated; and difficult system and every human has a different personality of action. Using the same tone and the same language does not fit in order to communicate with different people. However, it is usual for us to use different tones and languages while we communicate with our friends and other people but it is impossible for the machines to have different personas for different people until now.
Researchers from the tier-I organisations are doing a lot these days in the domain of natural language processing. Since natural processing language is a complex field, these researchers are trying every possible way to make it easier for implementations. Machine-human interaction is one such task which has led to many pieces of research and innovations.
Recently, Microsoft unveiled Icecaps 0.1, an open-source natural language processing repository. Icecaps stands for Intelligent Conversation Engine: Code and Pre-trained Systems. This toolkit not only allows the researchers to enhance the way chatbots are communicating with humans but also helps in incorporating other features of natural language processors in order to emphasize conversational modelling.
Icecaps provides a range of tools along with the general NLP literature from the existing conversation modelling research by the in-house researchers of the tech giant which include personalisation embeddings, maximum mutual information-based decoding, knowledge grounding, and an approach for enforcing more structure on shared feature representations to encourage more diverse and relevant responses.
Behind the Architecture
At its core, there is a flexible multi-task learning paradigm which is used in conversational modelling to combine general conversational data with unpaired utterances. Icecaps enables multi-task learning by representing most models as chains of components and allowing researchers and developers to build arbitrarily complex configurations of models with shared components. It also implements SpaceFusion, a specialised multi-task learning paradigm which adds regularisation terms to shape the latent space shared among tasks.
In order to achieve personalisation in conversational scenarios, Icecaps allows researchers and developers to train multi-persona conversation systems on multi-speaker data using personality embeddings. To produce more informed responses, this toolkit also implements an approach to knowledge-grounded conversation that combines machine reading comprehension and response generation modules.
This toolkit is built on top of TensorFlow and is mainly intended for the Python environment which helps to make it easy for the users to construct sophisticated training configurations using multi-task learning. It is recommended to use this toolkit in an Anaconda environment with Python 3.7.
Here are some of the key takeaways that Icecaps 0.1 provide to build and customise conversational systems, they are mentioned below
- Icecaps’ design is based on a component-chaining architecture, where models are represented as chains of components (e.g. encoders and decoders) which enables complex multi-task learning environments with shared components between tasks.
- Recent developments in conversational modelling are included in this toolkit such as SpaceFusion, personalisation embeddings and MRC-based knowledge grounding models.
- Icecaps provides customised decoding tools which allow the users to employ maximum mutual information, token filtering, etc. to improve response quality as well as diversity.
- Various data processing tools such as byte pair encoding and fixed-length multi-turn context extraction are provided in order to easily convert the text datasets into binarised TFRecords.
Version 0.2 is yet to be released and will include various features like more new models such as stochastic answer networks and personalized transformers, lexical and contextual embedding generators and an interactive GUI-based decoding session with improved flexibility. It will also include new data processing features including functionality for processing tree-structured JSON data.
Icecaps is designed to make building complex dialogue systems intuitive for the end-users that can communicate naturally with humans. Understanding the way of human behaviour and persona, this conversational modelling toolkit will bridge the communication gap between machines and humans and will communicate in the same persona as the human it is talking to.
Read the paper here.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad