Google AI Launches an Open Source Library to Store & Manipulate Large Multi-Dimensional Arrays

The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets.
Listen to this story

In a blog article published last week, Google AI introduced TensorStore, an open-source C++ and Python library designed for storage and manipulation of n-dimensional data. The library aims to address key engineering challenges in scientific computing through better management and processing of large datasets. 

Various contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single and expansive coordinate system. An example could be the use of air measurements over a geographical grid to estimate the weather. 

Another could be making medical imaging predictions using multi-channel image intensity values from a 2D or 3D scan. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

A single dataset under these circumstances might also need petabytes of storage and working with such datasets could be challenging—as users may receive and write data at different scales and unpredictable intervals. 

Researchers at Google AI claim that TensorStore has already been used to solve key engineering challenges such as management and processing of large datasets in neuroscience—such as peta-scale 3d electron microscopy data and “4d” videos of neuronal activity

Additionally, the library has been used in the creation of PaLM—a large-scale machine learning model—by addressing the problem related to managing model parameters or checkpoints during distributed training. 

This library natively supports storage systems like Google Cloud Storage, HTTP servers, local and network filesystems, and more, and offers a unified API for reading and writing diverse array types such as zarr and N5. With strong atomicity, consistency, isolation, and durability (ACID) guarantee, it also provides read/writeback caching and transactions. Furthermore, it is capable of supporting safe, efficient access from multiple processes and machines via optimistic concurrency. 

TensorStore is also expected to offer an asynchronous API that would enable high-throughput access even to high-latency remote storage. It provides a simple Python API to load and manipulate large array data. For example, a TensorStore object is created representing 56 trillion voxel 3d image of a fly brain and which accesses a small 100×100 patch of the data as a NumPy array: 

Source: Google AI Blog

The blog claims, “No actual data is accessed or stored in memory until the specific 100×100 slice is requested; hence arbitrarily large underlying datasets can be loaded and manipulated without having to store the entire dataset in memory, using indexing and manipulation syntax largely identical to standard NumPy operations.”

To know more, Google AI has provided the TensorStore package that can be installed using simple commands. For further reference, check out the tutorials and API documentation for usage details. 

More Great AIM Stories

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM