Active Hackathon

Now GitHub Wants To Preserve Codes Forever, Launches Archive Program

GitHub Archive Program

In an attempt to preserve open-source codes for future generations, Github recently launched their GitHub Archive Program. This week at the GitHub universe event, the firm announced that they will open the Arctic Code Vault on 2 February 2020 to host open-source repositories.

Partnered with Long Now Foundation, the Internet Archive, the Arctic World Archive, and more, GitHub aims to protect codes in a wide range of storage and formats for 1,000 years. Archived data last only for a few years as it gradually rots, which is called bit rot. As a result, over the years, we lose critical information from storage devices whose lifespan is around 30 years.


Sign up for your weekly dose of what's up in emerging technology.

Therefore, to help the future generations in gaining access to the building blocks of today’s superior products, the consortium has planned to store data in media that can last for many hundred years and mitigate the problem of data decay.


GitHub will utilise a flexible durable strategy for archiving code called pace layer. The idea is to have a wide range of storage type that can cater to various kinds of needs: from real-time to long-time. 

Consequently, the program has been partitioned into hot, warm, and cold storage. While the hot storage will be used for real-time requirements and will hold the current GitHub data, the medium storage will be updated on a monthly basis to archive the repository. The cold storage is focused on long-term storage and will be updated every 5+ years.

All the three storage is further divided into several types to preserve data by crawling through the GitHub site and keeping it in the Internet Archive, Bodleian Library, among others. But, the facet of this storage is the Project Silica that was developed by the Microsoft Research team. Eventually, all the data of public active repository to safeguard the codes for 10,000 years.

Cold Storage

Each repository will be stored in a single TAR file and will be encoded with QR for data integrity. GitHub data will be protected in the 250 meters deep decommissioned coal mine of an Arctic mountain situated in Svalbard. While the location is also prone to climate change, it is envisioned that it will not have any impact on the preserved data. Svalbard is one of the safest places on the planet for years as it witnesses only a few natural hazards. The location is also used for archiving other knowledge of humans, such as various types of seeds, the Svalbard Global Seed Vault.

How Impactful Will It Be?

While GitHub’s primary intention is to preserve open-source for the future generation, it also wants to encourage other organisations to plan for long-term storage. Besides, the firm also illustrates examples of the history of lost technologies in several events such as the Library of Alexandria, Roman Concrete, and more, where humans lost much useful information due to absence of backups.

However, those were the times when the information was difficult to move, and new groundbreaking discovery or invention took years or even decades to proliferate. Thus, only a few people used to stay informed about technologies. But in the internet world, when information is moving at a rapid pace, it is easy to keep track of the developments.

Besides, today, in ever-changing landscape technology become irrelevant very quickly due to new technologies. Today’s open-source projects might not be as useful as the firm is envisioning it.


Storing of physical things such as seed for the future generation is a wise idea as there is a risk of losing something that will always be useful to humans. Technology in thousands of years from now would be more advanced than it is today. However, preparing for the worst is not a terrible idea and software is definitely the way to go.

More Great AIM Stories

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.