This Theoretical Breakthrough At MIT Could Significantly Increase Data Storage

A team of MIT researchers has made a breakthrough in the storage and retrieval of data in computers. New research from MIT’s CSAIL on linear-probing hash tables may result in more efficient computer data storage and retrieval. In addition, a theoretical discovery could increase data storage. Data structures, known as “linear-probing hash tables”, are a standard strategy of organising and storing data in computers using data structures. Linear-probing hash tables allow data to be stored at any point along with an array.

For instance, William Kuszmaul, a computer science PhD student at MIT, proposes that the database contains 10,000 social security numbers. “Obtain the social security number x and compute its hash function, h (x), which yields a random number between 1 and 10,000. The following step is to take that random number, h (x). Then, to navigate to that position in the array and insert the social security number x.”

Handling Deletion

The deletion of personal information like social security numbers follows a particular set of rules. It can be difficult to find something later if the hash table is left empty after deleting information. When you’re looking for something, it’s easy to have the impression that there are no other places where you can get it. To overcome this issue, Kuszmaul adds, “one can go to the location of the piece that was removed and place a small marker called a ‘tombstone,’ which shows there was once an element here, but it has since been removed.”


Sign up for your weekly dose of what's up in emerging technology.

Linear Probing Features

However, most people who utilise linear probing hash tables anticipated that when they were too crowded. Linear probing would conduct lengthy stretches of occupied locations in a “cluster” fashion. As a result, the time required to discover an empty seat increases considerably — to the point of quadratic growth — as long as the expectation is false. As a result, individuals have been educated to work with hashtables on a limited capacity basis. They are a strategy that can be costly in terms of the quantity of hardware that a business must purchase and maintain.

However, the study of Kuszmaul and his colleagues, Michael Bender of Stony Brook University, and Bradley Kuszmaul of Google has utterly disproved this long-held notion that has been used to counteract high-load issues. Instead, the linear probing hash table provides great storage capacity without losing performance. Additionally, the team developed a novel approach named “Graveyard Hash.” This hashtable is accomplished by intentionally increasing the number of tombstones in the array until it fills approximately half of the available space. These tombstones provide additional areas for future inscriptions. Kuszmaul says this method is taken in the opposite direction of how people are typically educated and “may result in the best performance in a linear probing hash table.” 

Download our Mobile App


Companies demand new technologies for more effective data storage and retrieval in computers in today’s era of information overload. According to researchers at MIT, linear probing can reintroduce older technology such as linear probing hash tables since they are more efficient. The researchers compared modern and historical storage systems. Along with the traditional method, the researchers developed a new technique called cemetery hashing, which entails artificially increasing the number of tombstones in an array until they occupy approximately half of the available locations. As a result, linear probing will improve data storage, retrieval speed and efficiency from computers.
For more information, refer to the article.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox