Recently, researchers at NumPy released its first-ever review paper “Array programming with NumPy” that is based on how a few fundamental array concepts lead to a simple and powerful programming paradigm for organising, exploring and analysing scientific data. The entire review paper has been published by the developers a few days ago after a gap of almost fifteen years since its inception.
Created in 2005, NumPy is an open-source project that aims to enable numerical computing with Python. In the current scenario, almost every scientist working in Python draws on the power of NumPy. The library adds support for large, multi-dimensional arrays as well as matrices, and brings the computational power of languages like C and Fortran to Python.
From capturing the first image of a black hole to sport analytics, NumPy plays a significant role in research analysis pipelines including fields such as physics, astronomy, geoscience, biology, psychology, economy, engineering, finance and more.
Role of NumPy in Array Programming
Array programming provides a powerful, expressive as well as compact syntax for performing functions like accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is said to be the primary array programming library for the popular Python language.
According to the researchers, this library is the foundation upon which the scientific Python ecosystem is constructed, and it is the pervasiveness of the library that several projects, targeting audiences with specialised needs, have developed their own NumPy-like interfaces and array objects.
NumPy increasingly acts as an interoperability layer between such array computation libraries. Together with its API, the library provides a flexible framework to support the next decade of scientific and industrial analysis. The inherent simplicity of this library led the NumPy array to be the de facto exchange format for array data in Python.
The NumPy Array
The NumPy array is said to be a data structure, which can store and access multidimensional arrays in an efficient manner. It incorporates various fundamental array concepts. Also known as tensors, NumPy array consists of a pointer to memory, along with metadata used to interpret the data stored, notably the data type, shape and strides.
The data type defines the nature of elements stored in an array and the shape of an array explains the number of elements along each axis, and the number of axes is the dimensionality of the array. At the same time, the strides are necessary to interpret computer memory, which stores elements linearly, as multidimensional arrays.
NumPy has the capability to store arrays in either C or Fortran memory order, iterating first over either rows or columns. This allows external libraries written in those languages to access NumPy array data in memory directly. The library provides in-memory, multidimensional, homogeneously typed arrays on CPUs and runs on machines, ranging from embedded devices to the world’s largest supercomputers.
NumPy and Other Libraries
Even though NumPy is not part of Python’s standard library, yet the Python language has added new features and special syntax so that NumPy would have a more concise and easier-to-read array notation.
Popular libraries like SciPy and Matplotlib are tightly coupled with NumPy for various development and uses. According to the researchers, the combination of NumPy, SciPy and Matplotlib, together with an advanced interactive environment such as IPython or Jupyter, provides a solid foundation for array programming in Python.
They claimed that this library is the base of the scientific Python ecosystem. “The interactive environment created by the array programming foundation and the surrounding ecosystem of tools — inside of IPython or Jupyter — is ideally suited to exploratory data analysis.”
With the help of NumPy’s high-level API, users can leverage highly parallel code execution on multiple systems with millions of cores, all with minimal code changes. The library is no longer merely the foundational array library underlying the scientific Python ecosystem, but it has become the standard API for tensor computation and a central coordinating mechanism between array types and technologies in Python.
NumPy is poised to embrace such a changing landscape, and to continue playing a leading part in interactive scientific computation, although to do so will require sustained funding from government, academia and industry, concluded the researchers.
Read the paper here.