Active Hackathon

Team Of Software Engineers At Facebook Releases “Neural Network Compiler” For PyTorch 1.10

A team of software engineers at Facebook, led by Software Engineer Bertrand Maher, recently released a JIT compiler for CPUs based on LLVM, called NNC, for "Neural Network Compiler."

A team of software engineers at Facebook, led by Software Engineer Bertrand Maher, recently released a JIT compiler for CPUs based on LLVM, called NNC, for “Neural Network Compiler.” The result was derived from the pyhpc-benchmark suite.

The benchmark itself is a wonderland for an ML compiler, with lots of opportunities for loop fusion meaning you can get really astonishing speedups. While the original benchmark forces single-thread execution (and NNC does well, yielding 23x over NumPy), it really shines when you lift the threading cap. On a 96-core development box, it saw 150x over PyTorch 1.9, and 300x over NumPy.


Sign up for your weekly dose of what's up in emerging technology.

Tremendous amounts of time and resources go into the development of Python frontends to high-performance backends, but those are usually tailored towards deep learning. The development team wanted to see whether it can profit from those advances, by using these libraries for geophysical modelling, using a pure Python ocean simulator Veros

Image Source: Bertrand Maher

By assembling benchmark results, the team learned the following: (one’s mileage may vary)

  • The performance of Jax seems very competitive, both on GPU and CPU. It is consistently among the top implementations on the CPU and shows the best performance on GPU.
  • Jax’s performance on GPU seems to be quite hardware dependent. Jax’s performance is significantly better (relatively speaking) on a Tesla P100 than a Tesla K80.
  • Numba is a great choice on CPU if you don’t mind writing explicitly for loops (which can be more readable than a vectorized implementation), being slightly faster than Jax with little effort.
  • If you have embarrassingly parallel workloads, speedups of > 1000x are easy to achieve on high-end GPUs.
  • Tensorflow is not great for applications like this, since it lacks tools to apply partial updates to tensors (in the sense of tensor[2:-2] = 0.).
  • Don’t bother using Pytorch or vanilla Tensorflow on the CPU. Tensorflow with XLA (experimental_compile) is great though!
  • Reaching Fortran performance on CPU with vectorized implementations is hard.

Veros is the versatile ocean simulator which aims to be a powerful tool that makes high-performance ocean modeling approachable and fun. Veros supports a NumPy backend for small-scale problems and a high-performance JAX backend with CPU and GPU support. It is fully parallelized via MPI and supports distributed execution. Veros is currently being developed at Niels Bohr Institute, Copenhagen University..

More Great AIM Stories

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter