Last updated March 16, 2021
In AI Mysteries

Complete Guide to SDNet: Contextualized Attention-based Deep Network for Conversational Question-Answering

SDNet is a contextualized attention based deep neural network that achieved State of the Art results in the challenging task of Conversational Question Answering. It makes use of inter attention and self-attention along with Recurrent BIdirectional LSTM layers.

Published on March 16, 2021
by Pavan Kandru

Conversational Question Answering is an exciting task that requires the model to read a passage and answers questions in dialogue. It is different from Machine Reading Comprehension, where the model reads a passage and answers questions in a stateless manner, i.e. it doesn’t use information from previous questions and answers. This new task expects the model to comprehend the passage, understand the context, and do coreference resolution.

Chenguang Zhu, Michael Zeng1 and Xuedong Huang, researchers at Microsoft, introduced this model in a paper published on 2nd January 2019

Architecture

The SDNet model is built upon Machine Reading Comprehension(MRC) models.

Let us look at the model’s innovative architecture step by step.

Inputs

Model takes passage as an input from which context(C) is learned. It also takes the current question as input. It requires the previous question-answer pairs to understand the context of the dialogue.

Each question is represented by :

Qk = {Qk−N ; Ak−N ; ..., Qk−1; Ak−1; Qk} N previous Question(Q) and Answer(A) pairs are taken into consideration while answering the current question.

All the Questions on one passage are treated as a batch by the model.

Embeddings

The model uses both Glove and BERT representations of each word or token given in the context and question. Glove embeddings are used in a straightforward 300D vector lookup fashion.BERT representations for each word are calculated by using the Byte Pair encoding representations. Each word is broken down into s BPE tokens, Each token has L hidden vectors, one for each layer of BERT. These are summed as below to get a single vector for each word.

\alpha is the weight which is learned.

Context Layers

We have each word of the context(passage) vectorized using different techniques. We need to combine these vectors and feed them into Context Layers. The input to context layers is a vector of w’s corresponding to each word.

f is a feature vector representing the POS, NER and exact matching with question.
h is a word-level inter attention defined below.
BERT is the vector mentioned in the embeddings section
GLoVe is a 300d vector of the word.

Word level Inter attention is one of the inputs of the context layer. It is calculated from question to context using the word embeddings of question(Q) and context(C).

Context Layer(left part in the image) contains K Bidirectional LSTMs to develop a context-based understanding of the passage. Let the output of these RNNs be

A MultiLevel Attention block is used to calculate attention from question to context. Attention score for each token in context is calculated using all the previous RNN layer outputs.

Note that Query value of attention is a vector representation of each token in the passage whereas Key,value pairs are similar vector representations of the question.

A Shortcut connection is added from RNNs output to the MLA output, and their concatenation is passed through one more BIdirectional LSTM.Traditional Self Attention is used on the outputs of previous RNN layer. One more RNN layer is added on top of the Self Attention layer to get the final output (uC) of the context layers.

Question Layers

Question layers are very similar to COntext Layers. They contain the following layers.

Glove and BERT embeddings are concatenated.
K RNN layers to develop contextualized understanding of Question.
ONe more RNN layer to generate higher level understanding.
Self Attention on the output of RNN to generate final question representation.

These n vectors representing the questions are further compressed into one vector as shown below.

uQ =Σi βi uQi, where βi ∝ exp (wTuQi ) and w is a parameterized vector.

Output Layer

This model’s outputs can be Yes/No or the span of the passage that answers the question.

To get the span we need probabilities of answers starting from each word in the context.

This probability is used along with outputs from question and context layers to generate probabilities of answer ending at each word of the passage.

If the result is in yes /no answer to the question, we need to generate corresponding probabilities.

All the W’s newly introduced in the calculation of probabilities can be learned during training.

In Action

Microsoft has made the code for SDNet model opensource.It is available here

We can clone this repository with

!git clone https://github.com/microsoft/SDNet.git

Let’s use data from https://stanfordnlp.github.io/coqa/ to train and test the model.

We need to download data,bert model and glove embeddings to train this model. Following are the commands to get these files.

 !wget https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json
 !wget https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json
 !wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz
 !tar -xf bert-base-uncased.tar.gz
 !wget http://nlp.stanford.edu/data/glove.840B.300d.zip
 !cp /content/SDNet/conf /content/coqa/

We need to arrange these files into the following directory structure.

Now training the model is done with a simple command

!python SDNet/main.py train coqa/conf

This command will train using train data and predict results for dev data. But let’s see how to only predict without training the model. To do this, we need a pretrained model, test data and a config file. We have saved all of them from the previous step.

 from SDNet.Models.SDNetTrainer import SDNetTrainer
 from SDNet.Utils.Arguments import Arguments
 conf_args = Arguments(conf_file)
 opt = conf_args.readArguments()
 opt['cuda'] = torch.cuda.is_available()
 opt['confFile'] = conf_file
 opt['datadir'] = os.path.dirname(conf_file)  # conf_file specifies where the data folder is
 for key,val in cmdline_args.__dict__.items():
     if val is not None and key not in ['command', 'conf_file']:
         opt[key] = val
 print(opt)
 model = SDNetTrainer(opt)
 predictions,confidence,pred_json = model.official(model_path,test_data)

Following are the validation F1 scores obtained by SDNet model using various settings on CoQA dataset.

The code mentioned above is available here.

Conclusion

Many Applications are employing chatbots to interact with human customers. But these chatbots are limited in their capability to maintain a coherent dialogue. Models like SDNet can immensely help in the betterment of chatbots as they can solve coreference resolution and context understanding problems to a good extent.

Access all our open Survey & Awards Nomination forms in one place >>

Pavan Kandru

AI enthusiast with a flair for NLP. I love playing with exotic data.

Complete Guide to SDNet: Contextualized Attention-based Deep Network for Conversational Question-Answering

Architecture

Inputs

Embeddings

Context Layers

Question Layers

Output Layer

In Action

Conclusion

Pavan Kandru

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.