Guide to Abstract Meaning Representation(AMR) to text with TensorFlow

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations.

Understanding and making sense of human languages to generate value is the vital objective of Natural Language Processing. Owing to this requirement, the task of Representing text becomes extremely important to NLP practitioners. There are innumerable formats to represent text. Different Techniques preserve different levels and types of information. The task at hand heavily influences the choice of a representation for storing data. Abstract Meaning Representation is one such representation that has several applications in the NLP domain.

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations. In this sense, sentences similar in meaning should be assigned the same AMR, even if they are not identically worded.

Example:

China is expanding its military power to attempt to join the ranks of the superpowers” 

AMR Graph

Each node represents a concept, and each edge represents the type of relationship between these concepts.

In this post, let’s explore a graph to sequence-based architecture to generate natural text using an AMR graph.

AMR to Text Translation

We are often interested in the natural language information present in the AMR graph.

But extracting this language is not a trivial task as the tenses and functional information are abstracted away during AMR graph construction. Some statistical approaches are used to solve this problem, but they require too many rules and are not easily generalizable.

Deep Learning, seq2seq models, in particular, demonstrated great results in translating AMR graphs. To serialise the graph, input  Depth, First Traversal is used, and the input  sequence is generated. But the task of converting graph input to sequence input adds more complexity to the algorithms. Moreover, the model itself struggles to understand the original connections after serialization. The following architecture uses graph state LSTMs to accept the graph input directly.

Graph State LSTM

AMR graph is represented as a set of hidden state vectors g = { hj } each vector in this set represents a node. Edges of the graph are represented as triples (i,j,l) i,j,l are a source, target and label of an edge.

We will use LSTM cells to connect these nodes and let the information exchange happen between the graph’s neighboring nodes. These exchanges between neighboring nodes occur recurrently, and information propagates through the entire graph. This helps in retaining nonlocal relationships between concepts. Number of time steps needed depends on the size of the graph.

State Transitions through time

Each node is represented as the following LSTM cell.

LSTM cell

Embedding of an edge=[embedding of source node; embedding of target node; embedding of label]

Pretrained Glove vectors are used for embedding the tokens.

For each node Incoming Neighbour Nodes+Edges 

 passage_in_neighbor_edge_representations = tf.nn.embedding_lookup(self.edge_embedding,
                     self.passage_in_neighbor_edges)
 # [batch_size, passage_len, passage_neighbors_size_max, input_dim]
 passage_in_neighbor_node_representations = collect_neighbor_node_representations(
         passage_node_representation, self.passage_in_neighbor_indices)
 passage_in_neighbor_representations = tf.concat(
         [passage_in_neighbor_node_representations, passage_in_neighbor_edge_representations], 3)
 passage_in_neighbor_representations = tf.multiply(passage_in_neighbor_representations,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, passage_len, input_dim + edge_dim]
 passage_in_neighbor_representations = tf.reduce_sum(passage_in_neighbor_representations, axis=2) 

Similarly, passage_out_neighbor_representations are also calculated.

We also need to add connections from previous hidden states of the graph to the current state. For this we need to calculate hidden states in each time step iteratively.

 passage_in_edge_prev_hidden = collect_neighbor_node_representations(passage_node_hidden,
                         self.passage_in_neighbor_indices)
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, node_len, neighbor_vector_dim]
 passage_in_edge_prev_hidden = tf.reduce_sum(passage_in_edge_prev_hidden, axis=2)
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_nodes_mask, axis=-1))
 passage_in_edge_prev_hidden = tf.reshape(passage_in_edge_prev_hidden,
         [-1, options.neighbor_vector_dim])
Now we can implement the LSTM structure displayed above.
## ig
 passage_edge_ingate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_ingate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_ingate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_ingate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_ingate)
                           + b_ingate)
 passage_edge_ingate = tf.reshape(passage_edge_ingate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## fg
 passage_edge_forgetgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_forgetgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_forgetgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_forgetgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_forgetgate)
                           + b_forgetgate)
 passage_edge_forgetgate = tf.reshape(passage_edge_forgetgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## og
 passage_edge_outgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_outgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_outgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_outgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_outgate)
                           + b_outgate)
 passage_edge_outgate = tf.reshape(passage_edge_outgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## input
 passage_edge_cell_input = tf.tanh(tf.matmul(passage_in_neighbor_representations, w_in_cell)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_cell)
                           + tf.matmul(passage_out_neighbor_representations, w_out_cell)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_cell)
                           + b_cell)
 passage_edge_cell_input = tf.reshape(passage_edge_cell_input,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 passage_edge_cell = passage_edge_forgetgate * passage_node_cell + passage_edge_ingate * passage_edge_cell_input
 passage_edge_hidden = passage_edge_outgate * tf.tanh(passage_edge_cell)

Decoder

Now that we have successfully encoded the AMR graph using hidden state vectors, we need a decoder to convert these vectors into a natural language sequence.

Before looking at the decoder in detail, we need to address the sparsity problem that occurs due to open class tokens like people, dates etc.

This problem is solved by using a copy machine with an attention layer during decoding. The final Probability of vocabulary is the interpolation between the probabilities generated by the decoder and probabilities from the attention layer.

Pfinal = ፀtPvocabulary+(1- ፀt)Pattention

Attention is implemented in the following manner.

Encoder_states are the hidden states of graph state LSTM

Encoder_features are calculated using encoder_states as shown below.

 encoder_features = tf.expand_dims(encoder_states, axis=2) # now is shape [batch_size, passage_len, 1, encoder_dim]
 W_h = variable_scope.get_variable("W_h", [1, 1, encoder_dim, options.attention_vec_size])
 self.W_h = W_h
 encoder_features = nn_ops.conv2d(encoder_features, W_h, [1, 1, 1, 1], "SAME") # [batch_size, passage_len, 1, attention_vec_size]
 encoder_features = tf.reshape(encoder_features, [batch_size, passage_len, options.attention_vec_size]) 

A context vector is calculated using attention.

context_vector = tf.reduce_sum(tf.expand_dims(attn_dist, axis=-1) * encoder_states, axis=1)

A Coverage Vector is calculated by accumulating the attentions from all the previous time steps.

coverage += attn_dist

Attention distribution is calculated using the decoder state, encoder features and  coverage vector.

 state_features = linear(decoder_state, attention_vec_size, True) # [batch_size, attention_vec_size]
 state_features = tf.expand_dims(state_features, 1) # [batch_size, 1, attention_vec_size]
 all_features = encoder_features + state_features # [batch_size, passage_len, attention_vec_size]
 coverage_features = tf.expand_dims(coverage, axis=-1) * w_c # coverage_features: [batch_size, passage_len, 1]
 all_features += coverage_features
 e = tf.reduce_sum(v * tf.tanh(all_features), axis=-1) # [batch_size, passage_len]
 attn_dist = nn_ops.softmax(e) # [batch_size, passage_len]
 attn_dist *= node_mask 

Now let’s see how the copy mechanism is implemented.

The ፀt is a switch for controlling generating a word from the vocabulary or directly copying it from the input graph. It is calculated using the context vector, decoder state, input to decoder cell and output of the decoder.

 p_gen = linear([context_t, state_t.c, state_t.h, x], 1, True) # [batch_size, 1]
 p_gen = tf.sigmoid(p_gen)
 # Each entry on the list output_t corresponds to one decoder step
 vocab_score_t = tf.nn.xw_plus_b(output_t, w, b) # apply the linear layer
 vocab_score_t = tf.nn.softmax(vocab_score_t)
 # For pointer-generator model, calc final distribution from copy distribution and vocabulary distribution
 vocab_score_t = self.merge_prob_dist_for_one_step(vocab_score_t, attn_dist, p_gen, node_idxs, node_mask)
 vocab_score_t = _clip_and_normalize(vocab_score_t, 1e-6) 

Inputs to the decoder cell are

  • State of the decoder during previous time step.
  • Output of the decoder during previous time step
  • Context vector during previous time step
  • Coverage vector during the previous time step
  • Attention Distribution.

The output of the decoder is the vocab_score.

Code repository consisting of snippets shown above is present here(https://github.com/freesunshine0316/neural-graph-to-seq-mp)

Here’s a link to a quick run of the above repository: (https://colab.research.google.com/drive/1OKLdTxeVaizNbhCvNO5TmnA9z0c6x001?usp=sharing)

Conclusion

Outside of formal study, AMRs have several applications in the NLP domain. They can be used to do Abstractive Text summarisation, Semantic Parsing and Information Extraction. Robust architectures to reconstruct natural language from these representations will help the advancement of these applications.

Download our Mobile App

Pavan Kandru
AI enthusiast with a flair for NLP. I love playing with exotic data.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.