Guide to Abstract Meaning Representation(AMR) to text with TensorFlow

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations.

Understanding and making sense of human languages to generate value is the vital objective of Natural Language Processing. Owing to this requirement, the task of Representing text becomes extremely important to NLP practitioners. There are innumerable formats to represent text. Different Techniques preserve different levels and types of information. The task at hand heavily influences the choice of a representation for storing data. Abstract Meaning Representation is one such representation that has several applications in the NLP domain.

AMR is a graph-based representation that aims to preserve semantic relations. AMR graphs are rooted, labelled, directed, acyclic graphs, comprising whole sentences. They are intended to abstract away from syntactic representations. In this sense, sentences similar in meaning should be assigned the same AMR, even if they are not identically worded.



Sign up for your weekly dose of what's up in emerging technology.

China is expanding its military power to attempt to join the ranks of the superpowers” 

AMR Graph

Each node represents a concept, and each edge represents the type of relationship between these concepts.

In this post, let’s explore a graph to sequence-based architecture to generate natural text using an AMR graph.

AMR to Text Translation

We are often interested in the natural language information present in the AMR graph.

But extracting this language is not a trivial task as the tenses and functional information are abstracted away during AMR graph construction. Some statistical approaches are used to solve this problem, but they require too many rules and are not easily generalizable.

Deep Learning, seq2seq models, in particular, demonstrated great results in translating AMR graphs. To serialise the graph, input  Depth, First Traversal is used, and the input  sequence is generated. But the task of converting graph input to sequence input adds more complexity to the algorithms. Moreover, the model itself struggles to understand the original connections after serialization. The following architecture uses graph state LSTMs to accept the graph input directly.

Graph State LSTM

AMR graph is represented as a set of hidden state vectors g = { hj } each vector in this set represents a node. Edges of the graph are represented as triples (i,j,l) i,j,l are a source, target and label of an edge.

We will use LSTM cells to connect these nodes and let the information exchange happen between the graph’s neighboring nodes. These exchanges between neighboring nodes occur recurrently, and information propagates through the entire graph. This helps in retaining nonlocal relationships between concepts. Number of time steps needed depends on the size of the graph.

State Transitions through time

Each node is represented as the following LSTM cell.

LSTM cell

Embedding of an edge=[embedding of source node; embedding of target node; embedding of label]

Pretrained Glove vectors are used for embedding the tokens.

For each node Incoming Neighbour Nodes+Edges 

 passage_in_neighbor_edge_representations = tf.nn.embedding_lookup(self.edge_embedding,
 # [batch_size, passage_len, passage_neighbors_size_max, input_dim]
 passage_in_neighbor_node_representations = collect_neighbor_node_representations(
         passage_node_representation, self.passage_in_neighbor_indices)
 passage_in_neighbor_representations = tf.concat(
         [passage_in_neighbor_node_representations, passage_in_neighbor_edge_representations], 3)
 passage_in_neighbor_representations = tf.multiply(passage_in_neighbor_representations,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, passage_len, input_dim + edge_dim]
 passage_in_neighbor_representations = tf.reduce_sum(passage_in_neighbor_representations, axis=2) 

Similarly, passage_out_neighbor_representations are also calculated.

We also need to add connections from previous hidden states of the graph to the current state. For this we need to calculate hidden states in each time step iteratively.

 passage_in_edge_prev_hidden = collect_neighbor_node_representations(passage_node_hidden,
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_in_neighbor_mask, axis=-1))
 # [batch_size, node_len, neighbor_vector_dim]
 passage_in_edge_prev_hidden = tf.reduce_sum(passage_in_edge_prev_hidden, axis=2)
 passage_in_edge_prev_hidden = tf.multiply(passage_in_edge_prev_hidden,
         tf.expand_dims(self.passage_nodes_mask, axis=-1))
 passage_in_edge_prev_hidden = tf.reshape(passage_in_edge_prev_hidden,
         [-1, options.neighbor_vector_dim])
Now we can implement the LSTM structure displayed above.
## ig
 passage_edge_ingate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_ingate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_ingate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_ingate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_ingate)
                           + b_ingate)
 passage_edge_ingate = tf.reshape(passage_edge_ingate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## fg
 passage_edge_forgetgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_forgetgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_forgetgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_forgetgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_forgetgate)
                           + b_forgetgate)
 passage_edge_forgetgate = tf.reshape(passage_edge_forgetgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## og
 passage_edge_outgate = tf.sigmoid(tf.matmul(passage_in_neighbor_representations, w_in_outgate)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_outgate)
                           + tf.matmul(passage_out_neighbor_representations, w_out_outgate)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_outgate)
                           + b_outgate)
 passage_edge_outgate = tf.reshape(passage_edge_outgate,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 ## input
 passage_edge_cell_input = tf.tanh(tf.matmul(passage_in_neighbor_representations, w_in_cell)
                           + tf.matmul(passage_in_edge_prev_hidden, u_in_cell)
                           + tf.matmul(passage_out_neighbor_representations, w_out_cell)
                           + tf.matmul(passage_out_edge_prev_hidden, u_out_cell)
                           + b_cell)
 passage_edge_cell_input = tf.reshape(passage_edge_cell_input,
         [batch_size, passage_nodes_size_max, options.neighbor_vector_dim])
 passage_edge_cell = passage_edge_forgetgate * passage_node_cell + passage_edge_ingate * passage_edge_cell_input
 passage_edge_hidden = passage_edge_outgate * tf.tanh(passage_edge_cell)


Now that we have successfully encoded the AMR graph using hidden state vectors, we need a decoder to convert these vectors into a natural language sequence.

Before looking at the decoder in detail, we need to address the sparsity problem that occurs due to open class tokens like people, dates etc.

This problem is solved by using a copy machine with an attention layer during decoding. The final Probability of vocabulary is the interpolation between the probabilities generated by the decoder and probabilities from the attention layer.

Pfinal = ፀtPvocabulary+(1- ፀt)Pattention

Attention is implemented in the following manner.

Encoder_states are the hidden states of graph state LSTM

Encoder_features are calculated using encoder_states as shown below.

 encoder_features = tf.expand_dims(encoder_states, axis=2) # now is shape [batch_size, passage_len, 1, encoder_dim]
 W_h = variable_scope.get_variable("W_h", [1, 1, encoder_dim, options.attention_vec_size])
 self.W_h = W_h
 encoder_features = nn_ops.conv2d(encoder_features, W_h, [1, 1, 1, 1], "SAME") # [batch_size, passage_len, 1, attention_vec_size]
 encoder_features = tf.reshape(encoder_features, [batch_size, passage_len, options.attention_vec_size]) 

A context vector is calculated using attention.

context_vector = tf.reduce_sum(tf.expand_dims(attn_dist, axis=-1) * encoder_states, axis=1)

A Coverage Vector is calculated by accumulating the attentions from all the previous time steps.

coverage += attn_dist

Attention distribution is calculated using the decoder state, encoder features and  coverage vector.

 state_features = linear(decoder_state, attention_vec_size, True) # [batch_size, attention_vec_size]
 state_features = tf.expand_dims(state_features, 1) # [batch_size, 1, attention_vec_size]
 all_features = encoder_features + state_features # [batch_size, passage_len, attention_vec_size]
 coverage_features = tf.expand_dims(coverage, axis=-1) * w_c # coverage_features: [batch_size, passage_len, 1]
 all_features += coverage_features
 e = tf.reduce_sum(v * tf.tanh(all_features), axis=-1) # [batch_size, passage_len]
 attn_dist = nn_ops.softmax(e) # [batch_size, passage_len]
 attn_dist *= node_mask 

Now let’s see how the copy mechanism is implemented.

The ፀt is a switch for controlling generating a word from the vocabulary or directly copying it from the input graph. It is calculated using the context vector, decoder state, input to decoder cell and output of the decoder.

 p_gen = linear([context_t, state_t.c, state_t.h, x], 1, True) # [batch_size, 1]
 p_gen = tf.sigmoid(p_gen)
 # Each entry on the list output_t corresponds to one decoder step
 vocab_score_t = tf.nn.xw_plus_b(output_t, w, b) # apply the linear layer
 vocab_score_t = tf.nn.softmax(vocab_score_t)
 # For pointer-generator model, calc final distribution from copy distribution and vocabulary distribution
 vocab_score_t = self.merge_prob_dist_for_one_step(vocab_score_t, attn_dist, p_gen, node_idxs, node_mask)
 vocab_score_t = _clip_and_normalize(vocab_score_t, 1e-6) 

Inputs to the decoder cell are

  • State of the decoder during previous time step.
  • Output of the decoder during previous time step
  • Context vector during previous time step
  • Coverage vector during the previous time step
  • Attention Distribution.

The output of the decoder is the vocab_score.

Code repository consisting of snippets shown above is present here(

Here’s a link to a quick run of the above repository: (


Outside of formal study, AMRs have several applications in the NLP domain. They can be used to do Abstractive Text summarisation, Semantic Parsing and Information Extraction. Robust architectures to reconstruct natural language from these representations will help the advancement of these applications.

More Great AIM Stories

Pavan Kandru
AI enthusiast with a flair for NLP. I love playing with exotic data.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM