NVIDIA designed GeForce 256, a chip company widely marketed as the ‘world’s first GPU’, in 1999. The single-chip processor with an integrated transform, lighting, and rendering engine could process a minimum of 10 million polygons per second.
Since then, the researchers and scientists have leveraged the floating point performance of GeForce 256 for general-purpose computing. Interest in GPUs and their real-world applications soared.
In 2003, a team of researchers from Stanford University, led by Ian Buck, introduced Brook–a widely adopted programming model extended to C with data-parallel constructs. In an earlier interview, Buck said, “At the time, a lot of the GPU development was driven by the need for more realism, which meant programs were being written that could run at every pixel to improve the game.”
However, Brook had one major drawback–it lived within a constrained streaming programming model. In 2006, Buck, who was then working with NVIDIA, led the launch of CUDA, touted as the world’s first solution for general computing on GPUs. Since then, the CUDA ecosystem has grown drastically. Currently, the CUDA toolkit consists of libraries, debugging and optimization tools, programming guides, API references, code samples and documentation. CUDA has emerged as a market differentiator for NVIDIA.
What is CUDA
CUDA is a parallel computing platform developed by NVIDIA for general computing. GPGPU (General Purpose Computing on GPUs) refers to the use of GPUs to assist in performing tasks generally handled by CPUs. GPGPUs allow information to flow in both directions–CPU to GPU and vice versa. Such bidirectional processing can improve efficiency in a wide variety of tasks, especially related to images and videos. CUDA is a leading proprietary framework for GPGPU from NVIDIA. It accelerates compute-intensive applications by using GPUs for parallelisable part of the computation
CUDA is the acronym for Compute Unified Device Architecture, a software layer that gives direct access to the GPU’s virtual instruction set and parallel computational elements for the execution of compute kernels.
It can work with programming languages such as C, C++, Fortran, etc. This helps specialists in parallel programming to use GPU. CUDA has applications in a wide range of fields including bioinformatics, life sciences, computer vision, electrodynamics, computational chemistry, finance, medical imaging etc.
CUDA vs competitors
In a recent interview with CNBC, KeyBanc analyst said that CUDA provided an edge to NVIDIA. He said, “CUDA software and developer base represents one of the deepest competitive moats in semis, while the emerging software revenue opportunity could re-rate NVIDIA further.”
Watch how @CNBCTechCheck and John Vinh of @Key_B2B view #CUDA as NVIDIA's "competitive moat" in machine learning and #AI computing, enabling a 2.5 million developer community. https://t.co/PpRPfFzbPU
— Dr. Jochen Papenbrock (@JoPapenbrock) June 10, 2021
One of CUDA’s strongest competitors is OpenCL. The latter was launched in 2009 by Apple and the Khronos Group to offer a standard for heterogeneous computing. Unlike CUDA, OpenCL can be used to program CPUs, GPUs, and other devices from different vendors.
Although OpenCL offers a portable language for GPU programming, its generality may negatively impact its performance. NVIDIA has a dedicated team of experts to keep CUDA cutting edge. Ipso facto, even the documentation in the case of CUDA is a cut above OpenCL.
The general consensus is CUDA performs better when it comes to transferring data to and from GPU. CUDA’s kernel execution is also consistently faster than OpenCL, even when the two implementations run on identical code. All these advantages make CUDA a popular choice for applications where high performance is important.
Apple’s Metal is another worthy rival of CUDA. It is poised to emerge as the top player on the GPGPU front. Metal combines OpenCL and OpenGL in a single low-level API; it is very efficient and provides huge performance benefits. Unlike OpenCL, Metal has its own consistent development team rolling out timely updates. However, it is limited to just Apple OSes–another huge disadvantage.