Google AI Launches an Open Source Library to Store & Manipulate Large Multi-Dimensional Arrays

The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets.
Listen to this story

In a blog article published last week, Google AI introduced TensorStore, an open-source C++ and Python library designed for storage and manipulation of n-dimensional data. The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets. 

Various contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single and expansive coordinate system. An example could be the use of air measurements over a geographical grid to estimate the weather. 

Another could be making medical imaging predictions using multi-channel image intensity values from a 2D or 3D scan. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

A single dataset under these circumstances might also need petabytes of storage and working with such datasets could be challenging—as users may receive and write data at different scales and unpredictable intervals. 

Researchers at Google AI claim that TensorStore has already been used to solve key engineering challenges such as management and processing of large datasets in neuroscience—such as peta-scale 3d electron microscopy data and “4d” videos of neuronal activity

Additionally, the library has been used in the creation of PaLM—a large-scale machine learning model—by addressing the problem related to managing model parameters or checkpoints during distributed training. 

This library natively supports storage systems like Google Cloud Storage, HTTP servers, local and network filesystems, and more, and offers a unified API for reading and writing diverse array types such as zarr and N5. With strong atomicity, consistency, isolation, and durability (ACID) guarantee, it also provides read/writeback caching and transactions. Furthermore, it is capable of supporting safe, efficient access from multiple processes and machines via optimistic concurrency. 

TensorStore is also expected to offer an asynchronous API that would enable high-throughput access even to high-latency remote storage. It provides a simple Python API to load and manipulate large array data. For example, a TensorStore object is created representing 56 trillion voxel 3d image of a fly brain and which accesses a small 100×100 patch of the data as a NumPy array: 

Source: Google AI Blog

The blog claims, “No actual data is accessed or stored in memory until the specific 100×100 slice is requested; hence arbitrarily large underlying datasets can be loaded and manipulated without having to store the entire dataset in memory, using indexing and manipulation syntax largely identical to standard NumPy operations.”

To know more, Google AI has provided the TensorStore package that can be installed using simple commands. For further reference, check out the tutorials and API documentation for usage details. 

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox