Facebook’s New Open Source Framework For Training Graph-Based ML Models

GTN Facebook

Facebook recently open-sourced the graph transformer networks (GTN) framework for effectively training graph-based learning models. In this case, GTN will be used in automatic differentiation of weighted finite-state transducers (WFSTs), which is an expressive and powerful graph. With GTN, researchers can easily construct WFSTs, visualize them, and perform operations on them

This framework enables the separation of graphs from operations on them that helps in exploring new structured loss functions and which in turn makes the encoding of prior knowledge on learning algorithms easier. Further, in a paper published by Awni Hannun, Vineel Pratap, Jacob Kahn & Wei-Ning Hsu of the Facebook AI Research, in this regard, proposed a convolutional WFST layer to be used in the interior of a deep neural network for mapping lower-level to higher-level representations.

GTN is written in C++ and has bindings to Python. GTN can be used to express and design sequence-level loss functions. With this framework, Facebook aims to make experimentation with structures in learning algorithms much simpler.


Sign up for your weekly dose of what's up in emerging technology.

How Does GTN Function

The use of WFST data structure is prevalent among speech recognition, natural language processing, and handwriting recognition applications. WFST, especially in the speech recognition systems, provides a common and natural representation for the hidden Markov models (HMM), context-dependency, grammar, pronunciation dictionaries, and weighted determinization algorithms to optimise time and space requirement. One of the most popular WFST-based products is the Kaldi toolkit for speech recognition which is trained to decode speeches. Kaldi heavily relies on OpenFST, which is an open-source WFST toolkit. 

To understand the importance of GTN framework for a WFST graph, we consider a general speech recogniser. A speech recogniser consists of an acoustic model that predicts the letters in the speech, its language model, and also identifies the word that may follow. These models are represented as WFSTs and are trained separately before combining to output the most likely transcription. It is, at this juncture, that the GTN library steps in to train the different models, which in turn provides better results. 

Before GTN, the use of the individual graphs at the training time was implicit, and the graph structure needed to be hard-coded in the software. Using GTN, however, researchers can use WFSTs dynamically at the training time, which in turn makes the whole system more efficient. It also helps researchers in constructing WFSTs and visualising them to perform functions on them. The gradients can be computed with respect to any participating graph by a simple call gtn.backward.

GTN’s programming style is similar to other popular frameworks such as PyTorch, in terms of the autograd API, imperative style, and autograd implementation. The only difference lies in replacing the tensors with WFSTs.

Wrapping Up

GTN allows separating the graphs from the operations on the graph, which gives greater freedom to experiment with a larger base of structures learning algorithms. 

GTN is similar to PyTorch, which is an automatic differential framework for tensors. In the case of PyTorch, which is also called a ‘define-by-run framework’, the autograd package provides automatic differentiation for operations on tensors. Every iteration is different which allows dynamic modifications between epochs.

Despite being similar on many counts, GTN provides an edge over PyTorch or any other tensor-based framework. GTN tools can easily experiment to develop a better algorithm. In the case of GTN, the graph structure is suited for encoding suggestive prior knowledge, which can teach the whole system and help in improving the data. 

Further, it is expected that in future, the structure of WFSTs, along with learning from the data, can make machine learning models lighter and more accurate. The use of WFST as a substitute for the tensor-based layer in a deep architecture is an interesting proposition. The paper (referenced above) concludes that WFSTs may be more effective than traditional layers for discovering significant representations from data.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM