While conventionally, vector processing was reserved for powerful supercomputers of the world, the current advancements in vector processors have made it accessible for developing less expensive and smaller supercomputers for AI and ML tasks. Not only it helps in developing supercomputers leveraging less computing power, but also empties the scalar processors for even critical workloads, making it a preferred choice for computing engines.
Thus, Japanese IT giant, NEC has developed an AI platform — SX-Aurora TSUBASA, which will allow NEC’s vector processors applicable for AI and ML workloads. NEC has been renowned for its vector processors which is preferred while addressing massive amounts of statistical data and complex ML applications.
While the performance results of SX-Aurora TSUBASA looks inaccurate when compared to NVIDA’s A100, the company still believes that its state-of-the-art vector engine processing can turn out to be more productive on massive datasets that require onboard memory.
In fact, Robbert Emery, director of technology marketing & business development at NEC, wrote in a blog post, how vector processors with advanced pipelines could help in solving complex real-world challenges, which was earlier only addressed by hyper-scale cloud solutions. “Vector processing, when paired with middleware optimised for parallel pipelining, lowers the entry barriers for new AI and ML applications,” stated Emery.
Also Read: How NVIDIA Built A Supercomputer In 3 Weeks
SX-Aurora TSUBASA vs NVIDIA DGX A100
According to NEC, the SX-Aurora TSUBASA is the flagship product of the company which brings together the scalar and vector processing performance of supercomputers, along with a large memory subsystem. Not only is it compatible with both Python and Tensorflow, but also offers multiple hardware configurations to run typical supercomputing workloads on desktops as well as laptops with FHFL cards.
While comparing the performance numbers of NEC’ supercomputer with NVIDIA’s latest DGX A100, it looked ineffective as well as inaccurate. However, the company firmly believes that the vector engine processors can still compete with NVIDIA’s scaler engine, which will provide high performance with low power consumption. NEC claims that unlike GPUs of NVIDIA, their vector engine processor is a CPU on its own and can address complete applications by removing the bottleneck between the host and the accelerator.
Alongside, NVIDIA’s DGX A100 comes with an onboard memory of 40GB, which puts SX-Aurora TSUBASA better positioned for massive datasets which require 48GB of onboard memory. Also, according to the company the vector engine processor has ten cores running at 1.6GHz, processing data along with the instructions, which in turn reduces the instruction cycles for decoding. This is also 4X of memory as NVIDIA is deploying on its V100 accelerator card.
Further, similar to NVIDIA GPU, NEC’s vector engine processor also uses the second generation of high bandwidth memory, but with six modules and one processor using multi-chip packaging technology. This leads them to speedy access to data with an aggregate memory bandwidth of 1.2 TB/s. But NVIDIA’s A100, on the other hand, comes with more memory bandwidth of 1.6 TB/s. Alongside, the smaller chip size of NEC’s SX-Aurora TSUBASA is another reason for it to encompass more ‘high bandwidth memory’ onto the interposer than NVIDIA.
However, whereas NVIDIA has a GPU memory of 320 GB total delivering 312 TFLOPS of deep learning performance, NEC’s AI platform comes with a top configuration of 384GB with a maximum vector engine performance of 157.28 TFLOPS. This, in turn, makes NVIDIA a preferred choice for deep learning training compared to others.
Although NVIDIA’s DGX A100 wins hands down on precision, NEC claims to charge way less for a vector engine card than what NVIDIA charges for its accelerator cards.
On the software aspect, the vector engine processor of NEC is a combination of single instruction multiple data (SIMD) pipelining along with NEC’s framework of vectorised and distributed data analytics as middleware. Such a combination not only allows this AI platform to accelerate data analysis operations on top of the Apache Spark MLlib and DataFrame framework but also enables it to support standard programming languages.
Although all computing hardware architecture, including NVIDIA, uses SIMD or vectors, none of them has been pure vectors. To extrapolate accurate results from high data-intensive applications CPU along with GPU isn’t enough; thus, a massive SIMD is required for delivering high performance. And, thus SX-Aurora TSUBASA, which encompasses one instruction that controls various operations, comes handy.
According to the company, this unique architecture drastically enhances the speeds of the workloads and diminishes the power consumption for memory-intensive applications. This could be ideal for many uses like large scale recommendation engines, high-throughput for large transactions, fraud detection in BFSI sectors, as well as authentication.
With artificial intelligence and machine learning steering the computing process in this complex environment, more and more applications are going to emerge in the future. NEC believes that the industry must re-look the whole concept leveraging multiple processors for gaining system performance. NEC claims that a vector engine with one core processor can be the preferred choice for solving complex business problems in future scenarios.