Now Reading
arXiv Makes All Its Research Papers Available On Kaggle To Boost Machine Learning Developments

arXiv Makes All Its Research Papers Available On Kaggle To Boost Machine Learning Developments

Srishti Deoras

arXiv, the largest repository of research papers recently announced that they are presenting a free and open pipeline of its dataset which is more than 1.7 million articles on Kaggle. It aims to boost developments in areas such as machine learning. It will include relevant features such as article titles, authors, categories, abstracts, full-text PDFs and more. 

arXiv has served the public and research communities for nearly 30 years in subjects ranging from physics, computer science, math, statistics, quantitative biology, economics and everything in between. 

“Having the entire arXiv corpus on Kaggle grows the potential of arXiv articles immensely. By offering the dataset on Kaggle we go beyond what humans can learn by reading all these articles and we make the data and information behind arXiv available to the public in a machine-readable format,” said Eleonora Presani, Executive Director, arXiv.



Kaggle has been a favourite destination for data scientists and machine learning engineers for quite some time now. Researchers can utilise Kaggle’s extensive data exploration tools and easily share their relevant scripts and output with others. With arXiv’s repository of articles, Kaggle users can push the limits of innovation.

The large datasets will offer researchers with new connections, innovative tools and perspectives to enable better discovery and innovation, believes Steinn Sigurdsson, Scientific Director, arXiv.

See Also
fake job classification

Especially at the time of the current pandemic, when the world is aiming for developments to cure COVID, free resources can help the researchers come up with solutions and innovations. For instance, Google’s COVID-19 Research Explorer is a tool that helps researchers pore through the CORD-19 dataset – a repository of 190,000+ science articles on COVID-19 on arXiv.

arXiv hopes that the release of the machine-readable dataset will inspire the creation of similar tools in the future. 

The dataset available on Kaggle will be updated weekly and is available here

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top