Python’s increasing popularity in scientific and high-performance computing

Python is often used as a glueing layer that relies on compiled optimised packages that it strings together to perform the target computations.
Python

Python is an experiment on how much freedom programmers need. Too much freedom and nobody can read another’s code; too little and expressiveness is endangered.

– Guido van Rossum, creator of Python programming language

Last year, Python was named the most popular programming language. The language’s growing popularity can be attributed to the rise of data science and the machine learning ecosystem and corresponding software libraries like Pandas, Tensorflow, PyTorch, and NumPy, among others. The fact that it is so easy to learn helps Python gain favour among the programmers’ community.

That said, Python is very slow compared to other compiled languages like Rust or Fortran. This is mainly because Python is an interpreted language, which means that a significant overhead is generated for carrying out each instruction. This slows down massive computations. This makes it unsuitable for scientific and high computing contexts. However, in this article, we will explore why this isn’t necessarily a gospel truth and how Python is being preferred for the mentioned tasks.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Python as a glueing layer

In the case of languages like C, C++ or Fortran, the source code is first compiled to an executable format before it can be run. However, with Python, there is no compilation step and the code is interpreted on a fly, line-by-line basis. The main advantage of an interpreted language like Python is that it is flexible, variables do not need to be declared in advance, and the program can adapt on the fly.

However, the main disadvantage, as discussed earlier, is the slower execution of numerically-intensive programs, making it unsuitable for scientific computing. However, time-intensive subroutines can be compiled in C or Fortran and then imported into Python in a way that it appears to behave like normal Python functions. 

Many common mathematical and numerical routines are pre-compiled to run very fast. They are grouped into two packages that can be added to Python in a transparent manner. Python is often used as a glueing layer that relies on compiled optimised packages that it strings together to perform the target computations. The most widespread package in scientific computing is NumPy (Numerical Python). The NumPy package offers basic routines for manipulating large arrays and matrices of numeric data. This manipulation is not done in plain Python; instead, all behind the scenes, heavy lifting is done by C/C++ or Fortran compiled routines.

Further, the SciPy (Scientific Python) package extends the functionality of NumPy with its collection of algorithms like minimisation, Fourier transformation, regression and other applied mathematics techniques. The popularity of both packages is soaring in the scientific community. They have also made Python comparable, if not better, than expensive commercial packages like MatLab.

Credit: The COOP Blog

Python for HPC

A team of researchers from the Imperial College, London, demonstrated the viability of Python as a platform for productive, portable and performant HPC applications at petascale. Freddie Witherden, one of the members of this team, said that Python was a ‘first-class language’ for HPC. He gave three reasons for this – increased emphasis on application performance and developer and user productivity with HPC codes; the growing tendency of HPC applications to rely on third-party APIs; the increased use of code generation for addressing performance bottlenecks. He said that Python could address these factors, and it puts the highest levels of performance on HPC hardware within researchers’ access. The team received the nomination for the prestigious ACM Gordon Bell Prize.

Expert opinion

“This is far from the truth, and the answer depends on the layer of the software stack referred to. In particular, the choice of programming language for end-users is very different from the one for those implementing the underlying systems, libraries, compilers, and runtimes. For the former, Python is popular as most end-user programming models are Python-based (e.g. TensorFlow, PyTorch), Python packages for high-performance and scientific computing are widely available, and Python offers high programmer productivity. However, for the underlying programming model implementations, libraries, compilers, and runtimes, C, C++, and CUDA are still the languages of choice as they deliver performance. Ultimately, all the performant Python packages themselves internally map to libraries written in C, C++, or CUDA. High performance is ultimately derived from those optimised libraries or, in some cases, from just-in-time compilers and code generators. So, C and C++ are still the languages of choice to implement the underlying libraries, compilers, code generators, or runtimes,” said Uday Bondhugula, Founder and CTO, Polymage Labs.

“Python is indeed the go-to language in scientific and high-performance computing. Because it is simple, scalable, versatile, efficient and platform-agnostic, it is gaining popularity among programmers, data scientists, ML engineers, and data analysts. It includes hundreds of publicly available libraries and frameworks and feature-rich packages for data manipulation (Pandas) and machine learning (scikit-learn). In addition, Python has been utilised in several enterprise AI frameworks (TensorFlow, PyTorch, etc.). Due to the abundance of open-source tools, academia appears to be moving away from R (and similar platforms), allowing Python to emerge as the preferred language among enthusiasts. Python also scales to enterprise needs by using packages like Dask, PySpark, Koalas and others,” said Pavan Nanjundaiah, Senior Director, Tredence Studio.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.