Graph Transformer Network: A New Framework For Language & Speech Processing

Last year, Facebook open-sourced graph transformer networks (GTN), a framework for automatic differentiation with a weighted finite-state transducer graph (WFSTs). To put things in perspective, GTN is to WFSTs what PyTorch is for automatic differentiation with tensors. GTN can be used to train graph-based machine learning models effectively.

What are WFSTs?

WFSTs are a widely used tool in speech recognition and language processing. A WFST data structure combines different sources of information such as speech recognition, natural language processing, and handwriting recognition. For example, a standard speech recogniser with an acoustic model, predicts the letters in a speech snippet and the likelihood of a given word following another. The same model can also be represented as a WFST. Here the WFST can be trained separately and combined to output most likely transcription by combining constraints from an acoustic-phoneme model. 

However, this approach has a drawback. Combining learned models using WFSTs only at the inference level faces issues such as exposure bias and label bias; this occurs primarily due to practical considerations. There has not been any previous work on making training WFSTs traceable. Further, no implementation exists which supports automatic differentiation in a high-level and efficient manner.


Sign up for your weekly dose of what's up in emerging technology.

GTN for WFST   

To remediate this, the researchers developed a framework for automatic differentiation through operations on WFSTs. This framework can be leveraged to design and experiment with existing and novel learning methods. A framework for differentiable WFSTs allows the model to learn from training data and incorporate such knowledge in the best possible way. 

Previously, developers had to hardcode the graph structure in the software. However, with GTN, researchers can dynamically use WFSTs at training time and the whole system can learn and improve from the data more efficiently.

Download our Mobile App

Credit: Facebook blog

In a recent talk, Awni Hannan, an AI researcher at Facebook and also one of the authors of this study said, “GTNs are a way to perform operations on the graphs coupled with automatic differentiation. In the case of GTNs,  instead of tensors, you have graphs, and instead of matrix multiplication, convolution, and pointwise operations, you have different interesting graph operations. Just like the operations you can do on tensors, these graphs are differentiable; it means that you can differentiate the output of the operations with respect to the input.”

With GTNs, researchers can easily construct WFSTs, visualise them and perform operations on them. For example, by using the function gtn.backward, gradients can be computed with respect to any graph participating in the computation.

GTN’s programming style is comparable to popular frameworks such as PyTorch. For example, the style, autograd API, and autograd implementation are based on similar design principles. The main difference is that tensors are replaced with WFSTs.

Speaking of replacing tensors with graphs in GTN, the latter allows researchers to encode more useful prior information about tasks into a learning algorithm. GTN allows researchers to encode the pronunciation for a word into a graph and incorporate it into the learning algorithm.

Wrapping up

GTNs give the freedom to experiment with a larger design space of structured learning algorithms by separating graphs from operations on graphs. Such creative freedom goes a long way in developing newer and better algorithms.

By using these graphs during the training, the whole system learns and improves from data. In the future, the structure of WFSTs combined with learning from the data can help machine learning models be more accurate, modular, and lightweight.

Read the full paper here.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

What went wrong with Meta?

Many users have opted out of Facebook and other applications tracking their activities now that they must explicitly ask for permission.