MITB Banner

Watch More

Google AI Launches an Open Source Library to Store & Manipulate Large Multi-Dimensional Arrays

The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets.
Listen to this story

In a blog article published last week, Google AI introduced TensorStore, an open-source C++ and Python library designed for storage and manipulation of n-dimensional data. The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets. 

Various contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single and expansive coordinate system. An example could be the use of air measurements over a geographical grid to estimate the weather. 

Another could be making medical imaging predictions using multi-channel image intensity values from a 2D or 3D scan. 

A single dataset under these circumstances might also need petabytes of storage and working with such datasets could be challenging—as users may receive and write data at different scales and unpredictable intervals. 

Researchers at Google AI claim that TensorStore has already been used to solve key engineering challenges such as management and processing of large datasets in neuroscience—such as peta-scale 3d electron microscopy data and “4d” videos of neuronal activity

Additionally, the library has been used in the creation of PaLM—a large-scale machine learning model—by addressing the problem related to managing model parameters or checkpoints during distributed training. 

This library natively supports storage systems like Google Cloud Storage, HTTP servers, local and network filesystems, and more, and offers a unified API for reading and writing diverse array types such as zarr and N5. With strong atomicity, consistency, isolation, and durability (ACID) guarantee, it also provides read/writeback caching and transactions. Furthermore, it is capable of supporting safe, efficient access from multiple processes and machines via optimistic concurrency. 

TensorStore is also expected to offer an asynchronous API that would enable high-throughput access even to high-latency remote storage. It provides a simple Python API to load and manipulate large array data. For example, a TensorStore object is created representing 56 trillion voxel 3d image of a fly brain and which accesses a small 100×100 patch of the data as a NumPy array: 

Source: Google AI Blog

The blog claims, “No actual data is accessed or stored in memory until the specific 100×100 slice is requested; hence arbitrarily large underlying datasets can be loaded and manipulated without having to store the entire dataset in memory, using indexing and manipulation syntax largely identical to standard NumPy operations.”

To know more, Google AI has provided the TensorStore package that can be installed using simple commands. For further reference, check out the tutorials and API documentation for usage details. 

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Bhuvana Kamath

Bhuvana Kamath

I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories