MITB Banner

Why I/O Profiling Is Crucial For Software And Data-Driven Companies

Share

In computer architecture, the combination of the CPU and main memory, to which the CPU can read or write directly utilising specific instructions, is deemed as the central component. Any transfer of data to or from the CPU/memory combination, for instance, by reading data from a disk drive, is considered I/O. 

Today, the applications are perpetually changing, and we see an increase in sophistication of research applications. For example, GPUs and CPUs (with many cores) have brought about enhanced machine learning and AI applications. 

With complex and larger workflows, it seems like a compute problem, but in fact, it all presents different I/O patterns down to the file system.  The file systems across many enterprises may not be structured to cope with these changes. Apart from the many challenges that an application developer has to worry about, to match memory block size with file systems to get maximum efficiency can be very challenging.

Furthermore, it’s even more difficult for them to do that when they’re running on, say 100 nodes or doing multiple different models and algorithms in the same application.

Bad I/O can come from a number of places such as third-party tools and libraries, particularly legacy code. Compute infrastructure and data storage has changed a lot over the last few decades. What were good software practices ten years ago may not necessarily be relevant now. There are still applications that assume that storage is fast and local, and that causes a lot of problems. 

Similarly, other misunderstandings about IT infrastructure can cause problems. For example, an application may run well in one environment, but it might run poorly in another environment, particularly if they are working with a partner organisation with a different IT set up. So if a company upgrades to a new IT infrastructure, brings in a new cluster or moves to a hybrid cloud,  their assumptions may not work anymore. Therefore, continuously monitoring and occasionally profiling your applications is pretty important for making sure that you do not run into any of these traps. 

Value In High-Performance Computing

Companies are looking to solve I/O related challenges. For example, recently Altair acquired an I/O profiling firm Ellexus, a company which analyses input/out (I/O) patterns in real-time to locate system bottlenecks and rogue applications. This way, the solution protects shared storage by finding bad I/O patterns quickly, which keeps tabs on cloud resource usage. I/O performance is of great importance to software vendors, high-performance computing organisations and anywhere where big data meets big compute. 

Altair made this strategic acquisition because the HPC community is interested in analysing proper I/O patterns. I In high-performance computing systems, parallel I/O architectures usually have complex hierarchies with multiple layers that collectively constitute an I/O stack. It may include high-level I/O libraries, I/O middleware and parallel file systems such as PVFS and Lustre. Therefore, it is important to understand the complex I/O hierarchies of high-performance storage systems including storage caches, HDDs, and SSDs, to implement state-of-the-art compiler/runtime system technology that targets I/O intensive HPC applications.

A lot of companies are looking to containerise their applications, usually with singularity containers in the HPC community to make sure that they have the right files to run their applications. Following on from that is application correctness where companies can determine why an application works for them but not for others. Perhaps a team has got the wrong version of Python installed or maybe there is a config file that is missing. All these could be quickly and easily picked up if you look at what is required to run the application. 

Detecting Dependencies

Detecting dependencies is but the number one thing that you want to do with I/O profiling. Long before companies start to look at the I/O patterns and performance, companies need to know which files have been accessed across which network locations. Companies need to do this for containerisation and migration. 

The vast majority of applications aren’t exascale and do not get re-written and re-optimised for every machine that comes along. Also, the codes may have evolved in 10-15 years. It, therefore, gets tricky for some companies to come along with a new file system and optimise their application again, for not just compute side attributes but also for the file system attributes to optimise the I/O patterns.

It is important for them to keep on top of application file-system dependencies and I/O patterns. There are tools to help users understand the way they access data, where users can run and maintain distributed, scientific and HPC applications. Only by understanding what you are running can you be fast, agile and cloud-ready.

Once a company has got to the point where their application is correct, they can look at the performance and efficiency. Hence, understanding the resources to make sure that they have the right type of storage for their application and that their temporary files are in the right place. 

Profiling I/O is also important so that you know how the application is performing over time and how the performance of the storage is affecting that application. Companies can then move on from profiling to monitoring I/O, which is a passive activity for making sure they understand what is going on in a broader sense. 

Share
Picture of Vishal Chawla

Vishal Chawla

Vishal Chawla is a senior tech journalist at Analytics India Magazine and writes about AI, data analytics, cybersecurity, cloud computing, and blockchain. Vishal also hosts AIM's video podcast called Simulated Reality- featuring tech leaders, AI experts, and innovative startups of India.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.