Inside DagsHub: The GitHub for data science and machine learning

Data science and machine learning deal with complex mathematical concepts and programming tools to build the right kind of algorithms for business decisions. Collaborations and discussions while undertaking and building these projects can be of great help for data scientists and machine learning practitioners. Just like GitHub exists for collaborating on software development in an open-source capacity, a 2019-launched platform named DagsHub is becoming increasingly popular for data scientists and machine learning engineers to come together at a common ground to build their work.

“It is like GitHub for data science and machine learning,” is how DagsHub describes itself. It is a web platform for data version control and collaboration for data scientists and machine learning engineers and is based on open-source tools, optimised for data science and oriented towards the open-source community.

The Tel-Aviv based company was launched in 2019 by Dean Pleban and Guy Smoilovsky. To date, it has raised over three million dollars in two rounds of funding in 2019 and 2020. Just a few weeks back, DagsHub launched DagsHub 2.0. With that launch, it also announced that one can now annotate data on DagsHub and have discussions on any file on the platform. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Home for open source data science

Data science teams can find it tough to collaborate. While explaining the reason for starting this platform, DagsHub says that the main difference between the data science and software development workflows is that existing tools are not suitable.

The founders add, “DagsHub was created to be a home for open-source data science, where everyone can contribute and make the research and development process transparent, inclusive and better for everyone; to help developers in the fields of machine learning and data science create and learn from each other. We believe that technology should help us focus on tackling the most interesting and important challenges in life.”

Download our Mobile App

Built on DVC

Data science and machine learning projects often require versioning large files, which Git is not very good at. DagsHub says that Git and git-lfs do not version the data pipeline. This means that if there is a modification in the data pipeline, the people working on the project will not know that the end of the pipeline should be reproduced.

The website informs that DagsHub is built on Git and DVC. DVC is an open-source command-line tool built for data and pipeline versioning. One can send another person a link to their DagsHub repo, and then they can explore the project. They can download the data of the owner’s project and models from any past version, experiment, or branch without running any code.

Language and library agnostic

If we look at the company website, it points out the features that DagsHub provides to users for their data science and machine learning projects. Some of the most important ones are:

  • Commenting – One can take notes on model architectures, discuss with others on annotations, and review another team member’s contribution to a project.
  • Version everything – One can explore relationships between data versions experiments and see the graph of the project history. When one finds the result they want, they can get the code as well as the configuration with just one command.
  • The DagsHub Annotations helps create a Label Studio instance with a single click. It is automatically synced with the datasets tracked on DagsHub Storage.
  • Language and library agnostic – It works for projects using Python, R, Keras and PyTorch.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox