MITB Banner

Watch More

arXiv Makes All Its Research Papers Available On Kaggle To Boost Machine Learning Developments

arXiv, the largest repository of research papers recently announced that they are presenting a free and open pipeline of its dataset which is more than 1.7 million articles on Kaggle. It aims to boost developments in areas such as machine learning. It will include relevant features such as article titles, authors, categories, abstracts, full-text PDFs and more. 

arXiv has served the public and research communities for nearly 30 years in subjects ranging from physics, computer science, math, statistics, quantitative biology, economics and everything in between. 

“Having the entire arXiv corpus on Kaggle grows the potential of arXiv articles immensely. By offering the dataset on Kaggle we go beyond what humans can learn by reading all these articles and we make the data and information behind arXiv available to the public in a machine-readable format,” said Eleonora Presani, Executive Director, arXiv.

Kaggle has been a favourite destination for data scientists and machine learning engineers for quite some time now. Researchers can utilise Kaggle’s extensive data exploration tools and easily share their relevant scripts and output with others. With arXiv’s repository of articles, Kaggle users can push the limits of innovation.

The large datasets will offer researchers with new connections, innovative tools and perspectives to enable better discovery and innovation, believes Steinn Sigurdsson, Scientific Director, arXiv.

Especially at the time of the current pandemic, when the world is aiming for developments to cure COVID, free resources can help the researchers come up with solutions and innovations. For instance, Google’s COVID-19 Research Explorer is a tool that helps researchers pore through the CORD-19 dataset – a repository of 190,000+ science articles on COVID-19 on arXiv.

arXiv hopes that the release of the machine-readable dataset will inspire the creation of similar tools in the future. 

The dataset available on Kaggle will be updated weekly and is available here

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Srishti Deoras

Srishti Deoras

Srishti currently works as Associate Editor at Analytics India Magazine. When not covering the analytics news, editing and writing articles, she could be found reading or capturing thoughts into pictures.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories