How Machine Learning Is Helping Data Centers Chill Out

Pipelines at GOOGLE Data center

A typical warm day on South pole is 20 degrees below zero and the irony is that the data centers run by ICE CUBE Neutrino Observatory can still get overheated. A normal day at any data center involves troubleshooting, racking and stacking and, with such enormous data in-flows, that the task becomes tedious for the employees and they are prone to sometimes failing to deliver in real-time. The technicians aren’t to be blamed either because a typical UPS is reactive — it either functions flawlessly or burns out altogether. Machine learning models, on the other hand, are proactive and they work stupendously to forecast failures.

Data centers are modern-day engineering marvels. There is no model to look up to and data center managers build customized devices and come up with original solutions to unforeseen problems.

Data Centers: Think Big

Anyone who has worked on a personal computer, would have struggled with a malfunctioned fan and other cooling issues. And we know how much data a typical household desktop handles. Now imagine thousands of these machines working in parallel. Think about the heat generated every minute and the unwanted power fluctuations. Though cloud storage decouples the physical hardware through network virtualization, the scale at which these storekeepers operate is still colossal.

Data abundance leads to data accumulation and data centers work round the clock to manage millions of bytes of incoming data as well as the previously stored data.

Data Centers Generate Data

Data centers also generate data — server data, power outages report for a particular system and a lot more. Cloud TPUs are designed to run heavy machine learning models. Now, the engineers are harnessing the same to recognize patterns and predict outages. Experts observe that the future of data storage is software defined. Dis aggregation and server simulation is already a thing and individual, off the shelf devices, are emulating multiple servers with virtualization.

Having said that, these virtual instances origin from a Hypervisor somewhere so no matter how much of virtualization we imbibe, hardware systems need maintenance and cooling systems need to get smarter.

Smarter Data Centers

Creating smarter data centers becomes increasingly important as more companies adopt a hybrid environment that includes the cloud, colocation facilities, and in-house data centers and will increasingly include edge sites, Jennifer Cooke, research director of IDC’s Cloud to Edge Data center Trends service, said to a leading online portal which writes about data centers.

Outside air temperature, the data center’s power load and the air pressure in the back of the servers where the hot air comes out from, are some of the few factors that are considered while designing a cooling system. So, where do machine learning models fit in?

Machine Learning To The Rescue

A typical rack may be consuming 10kW or, it may shoot to 15kW.  ML models can predict such spikes one hour into the future and provide much-needed breathing space to detect and resolve catastrophic outages.

For example, Google’s TPU 3.0 is power-hungry and it is not a viable way to cool it with air. So, the engineers have retrofitted infrastructure to accommodate direct-to-chip liquid cooling.

Google started deploying machine-learning software in its data centers processors it designed in-house to improve its deep learning capabilities. Its machine learning algorithms automatically adjust cooling plant settings continuously, in real-time, reducing the annual power consumption. Improving efficiency and risk analysis forms the core of any data center management job. Companies with in-house data science expertise pursue their own machine learning initiatives while others are turning to vendors who have built custom software to tackle the same.

Apart from customizing coolant circulation, these ML models can also:

  1. Analyse servers and detect anomalies, such as ghost servers running applications no longer in use.
  2. Consolidating data centers and migrating applications and data to a central data center, algorithms can help it determine how the move affects capacity at that facility.
  3. Bolster cyber security.

Algorithms detect anomalies that show signs of an impending failure, the system alerts customers so they can troubleshoot before the equipment goes down. Incident analysis helps in determining the root cause faster.

Autonomous Data Centers

The economics of data centers are crucial for any data vendor. Optimizing the power usage by employing state-of-the-art cooling systems is a challenge every professional in this industry faces.

Machine learning is expected to optimize every facet of future data center operations, including planning and design, managing IT workloads, ensuring up time, and controlling costs. IDC predicts that, by 2022,  50 percent of IT assets in data centers will be able to run autonomously because of embedded AI functionality.  

Companies are now offering solutions that utilize machine learning models. These models skim through the internal reports on the storage and help the engineers design storage space, optimize the rate of cooling, predict the next spike and solve other infrastructural redundancies.

Schneider Electric, Maya Heat Transfer Technologies (HTT), and Nlyte Software are one of the few top companies that offer ingenious solutions to existing problems and which are capable of forecasting a failure.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can Apple Save Meta?

The iPhone kicked off the smartphone revolution and saved countless companies. Could the Pro Reality headset do the same for Meta?