Now GitHub Wants To Preserve Codes Forever, Launches Archive Program

GitHub Archive Program

In an attempt to preserve open-source codes for future generations, Github recently launched their GitHub Archive Program. This week at the GitHub universe event, the firm announced that they will open the Arctic Code Vault on 2 February 2020 to host open-source repositories.

Partnered with Long Now Foundation, the Internet Archive, the Arctic World Archive, and more, GitHub aims to protect codes in a wide range of storage and formats for 1,000 years. Archived data last only for a few years as it gradually rots, which is called bit rot. As a result, over the years, we lose critical information from storage devices whose lifespan is around 30 years.

Therefore, to help the future generations in gaining access to the building blocks of today’s superior products, the consortium has planned to store data in media that can last for many hundred years and mitigate the problem of data decay.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.


GitHub will utilise a flexible durable strategy for archiving code called pace layer. The idea is to have a wide range of storage type that can cater to various kinds of needs: from real-time to long-time. 

Consequently, the program has been partitioned into hot, warm, and cold storage. While the hot storage will be used for real-time requirements and will hold the current GitHub data, the medium storage will be updated on a monthly basis to archive the repository. The cold storage is focused on long-term storage and will be updated every 5+ years.

Download our Mobile App

All the three storage is further divided into several types to preserve data by crawling through the GitHub site and keeping it in the Internet Archive, Bodleian Library, among others. But, the facet of this storage is the Project Silica that was developed by the Microsoft Research team. Eventually, all the data of public active repository to safeguard the codes for 10,000 years.

Cold Storage

Each repository will be stored in a single TAR file and will be encoded with QR for data integrity. GitHub data will be protected in the 250 meters deep decommissioned coal mine of an Arctic mountain situated in Svalbard. While the location is also prone to climate change, it is envisioned that it will not have any impact on the preserved data. Svalbard is one of the safest places on the planet for years as it witnesses only a few natural hazards. The location is also used for archiving other knowledge of humans, such as various types of seeds, the Svalbard Global Seed Vault.

How Impactful Will It Be?

While GitHub’s primary intention is to preserve open-source for the future generation, it also wants to encourage other organisations to plan for long-term storage. Besides, the firm also illustrates examples of the history of lost technologies in several events such as the Library of Alexandria, Roman Concrete, and more, where humans lost much useful information due to absence of backups.

However, those were the times when the information was difficult to move, and new groundbreaking discovery or invention took years or even decades to proliferate. Thus, only a few people used to stay informed about technologies. But in the internet world, when information is moving at a rapid pace, it is easy to keep track of the developments.

Besides, today, in ever-changing landscape technology become irrelevant very quickly due to new technologies. Today’s open-source projects might not be as useful as the firm is envisioning it.


Storing of physical things such as seed for the future generation is a wise idea as there is a risk of losing something that will always be useful to humans. Technology in thousands of years from now would be more advanced than it is today. However, preparing for the worst is not a terrible idea and software is definitely the way to go.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Foxconn Conning India?

Most recently, Foxconn found itself embroiled in controversy when both Telangana and Karnataka governments simultaneously claimed Foxconn to have signed up for big investments in their respective states