Active Hackathon

Guide to Abstract Meaning Representation(AMR) to text with TensorFlow

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations.

Understanding and making sense of human languages to generate value is the vital objective of Natural Language Processing. Owing to this requirement, the task of Representing text becomes extremely important to NLP practitioners. There are innumerable formats to represent text. Different Techniques preserve different levels and types of information. The task at hand heavily influences the choice of a representation for storing data. Abstract Meaning Representation is one such representation that has several applications in the NLP domain.

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations. In this sense, sentences similar in meaning should be assigned the same AMR, even if they are not identically worded.


Sign up for your weekly dose of what's up in emerging technology.


China is expanding its military power to attempt to join the ranks of the superpowers” 

AMR Graph

Each node represents a concept, and each edge represents the type of relationship between these concepts.

In this post, let’s explore a graph to sequence-based architecture to generate natural text using an AMR graph.

AMR to Text Translation

We are often interested in the natural language information present in the AMR graph.

But extracting this language is not a trivial task as the tenses and functional information are abstracted away during AMR graph construction. Some statistical approaches are used to solve this problem, but they require too many rules and are not easily generalizable.

Deep Learning, seq2seq models, in particular, demonstrated great results in translating AMR graphs. To serialise the graph, input  Depth, First Traversal is used, and the input  sequence is generated. But the task of converting graph input to sequence input adds more complexity to the algorithms. Moreover, the model itself struggles to understand the original connections after serialization. The following architecture uses graph state LSTMs to accept the graph input directly.

Graph State LSTM

AMR graph is represented as a set of hidden state vectors g = { hj } each vector in this set represents a node. Edges of the graph are represented as triples (i,j,l) i,j,l are a source, target and label of an edge.

We will use LSTM cells to connect these nodes and let the information exchange happen between the graph’s neighboring nodes. These exchanges between neighboring nodes occur recurrently, and information propagates through the entire graph. This helps in retaining nonlocal relationships between concepts. Number of time steps needed depends on the size of the graph.

State Transitions through time

Each node is represented as the following LSTM cell.

LSTM cell

Embedding of an edge=[embedding of source node; embedding of target node; embedding of label]

Pretrained Glove vectors are used for embedding the tokens.

For each node Incoming Neighbour Nodes+Edges 

 passage_in_neighbor_edge_representations = tf.nn.embedding_lookup(self.edge_embedding,
 # [batch_size, passage_len, passage_neighbors_size_max, input_dim]
 passage_in_neighbor_node_representations = collect_neighbor_node_representations(
         passage_node_representation, self.passage_in_neighbor_indices)
 passage_in_neighbor_representations = tf.concat(
         [passage_in_neighbor_node_representations, passage_in_neighbor_edge_representations], 3)
 passage_in_neighbor_representations = tf.multiply(passage_in_neighbor_representations,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, passage_len, input_dim + edge_dim]
 passage_in_neighbor_representations = tf.reduce_sum(passage_in_neighbor_representations, axis=2) 

Similarly, passage_out_neighbor_representations are also calculated.

We also need to add connections from previous hidden states of the graph to the current state. For this we need to calculate hidden states in each time step iteratively.

 passage_in_edge_prev_hidden = collect_neighbor_node_representations(passage_node_hidden,
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, node_len, neighbor_vector_dim]
 passage_in_edge_prev_hidden = tf.reduce_sum(passage_in_edge_prev_hidden, axis=2)
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_nodes_mask, axis=-1))
 passage_in_edge_prev_hidden = tf.reshape(passage_in_edge_prev_hidden,
         [-1, options.neighbor_vector_dim])
Now we can implement the LSTM structure displayed above.
## ig
 passage_edge_ingate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_ingate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_ingate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_ingate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_ingate)
                           + b_ingate)
 passage_edge_ingate = tf.reshape(passage_edge_ingate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## fg
 passage_edge_forgetgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_forgetgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_forgetgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_forgetgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_forgetgate)
                           + b_forgetgate)
 passage_edge_forgetgate = tf.reshape(passage_edge_forgetgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## og
 passage_edge_outgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_outgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_outgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_outgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_outgate)
                           + b_outgate)
 passage_edge_outgate = tf.reshape(passage_edge_outgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## input
 passage_edge_cell_input = tf.tanh(tf.matmul(passage_in_neighbor_representations, w_in_cell)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_cell)
                           + tf.matmul(passage_out_neighbor_representations, w_out_cell)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_cell)
                           + b_cell)
 passage_edge_cell_input = tf.reshape(passage_edge_cell_input,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 passage_edge_cell = passage_edge_forgetgate * passage_node_cell + passage_edge_ingate * passage_edge_cell_input
 passage_edge_hidden = passage_edge_outgate * tf.tanh(passage_edge_cell)


Now that we have successfully encoded the AMR graph using hidden state vectors, we need a decoder to convert these vectors into a natural language sequence.

Before looking at the decoder in detail, we need to address the sparsity problem that occurs due to open class tokens like people, dates etc.

This problem is solved by using a copy machine with an attention layer during decoding. The final Probability of vocabulary is the interpolation between the probabilities generated by the decoder and probabilities from the attention layer.

Pfinal = ፀtPvocabulary+(1- ፀt)Pattention

Attention is implemented in the following manner.

Encoder_states are the hidden states of graph state LSTM

Encoder_features are calculated using encoder_states as shown below.

 encoder_features = tf.expand_dims(encoder_states, axis=2) # now is shape [batch_size, passage_len, 1, encoder_dim]
 W_h = variable_scope.get_variable("W_h", [1, 1, encoder_dim, options.attention_vec_size])
 self.W_h = W_h
 encoder_features = nn_ops.conv2d(encoder_features, W_h, [1, 1, 1, 1], "SAME") # [batch_size, passage_len, 1, attention_vec_size]
 encoder_features = tf.reshape(encoder_features, [batch_size, passage_len, options.attention_vec_size]) 

A context vector is calculated using attention.

context_vector = tf.reduce_sum(tf.expand_dims(attn_dist, axis=-1) * encoder_states, axis=1)

A Coverage Vector is calculated by accumulating the attentions from all the previous time steps.

coverage += attn_dist

Attention distribution is calculated using the decoder state, encoder features and  coverage vector.

 state_features = linear(decoder_state, attention_vec_size, True) # [batch_size, attention_vec_size]
 state_features = tf.expand_dims(state_features, 1) # [batch_size, 1, attention_vec_size]
 all_features = encoder_features + state_features # [batch_size, passage_len, attention_vec_size]
 coverage_features = tf.expand_dims(coverage, axis=-1) * w_c # coverage_features: [batch_size, passage_len, 1]
 all_features += coverage_features
 e = tf.reduce_sum(v * tf.tanh(all_features), axis=-1) # [batch_size, passage_len]
 attn_dist = nn_ops.softmax(e) # [batch_size, passage_len]
 attn_dist *= node_mask 

Now let’s see how the copy mechanism is implemented.

The ፀt is a switch for controlling generating a word from the vocabulary or directly copying it from the input graph. It is calculated using the context vector, decoder state, input to decoder cell and output of the decoder.

 p_gen = linear([context_t, state_t.c, state_t.h, x], 1, True) # [batch_size, 1]
 p_gen = tf.sigmoid(p_gen)
 # Each entry on the list output_t corresponds to one decoder step
 vocab_score_t = tf.nn.xw_plus_b(output_t, w, b) # apply the linear layer
 vocab_score_t = tf.nn.softmax(vocab_score_t)
 # For pointer-generator model, calc final distribution from copy distribution and vocabulary distribution
 vocab_score_t = self.merge_prob_dist_for_one_step(vocab_score_t, attn_dist, p_gen, node_idxs, node_mask)
 vocab_score_t = _clip_and_normalize(vocab_score_t, 1e-6) 

Inputs to the decoder cell are

  • State of the decoder during previous time step.
  • Output of the decoder during previous time step
  • Context vector during previous time step
  • Coverage vector during the previous time step
  • Attention Distribution.

The output of the decoder is the vocab_score.

Code repository consisting of snippets shown above is present here(

Here’s a link to a quick run of the above repository: (


Outside of formal study, AMRs have several applications in the NLP domain. They can be used to do Abstractive Text summarisation, Semantic Parsing and Information Extraction. Robust architectures to reconstruct natural language from these representations will help the advancement of these applications.

More Great AIM Stories

Pavan Kandru
AI enthusiast with a flair for NLP. I love playing with exotic data.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.