AI Origins & Evolution

More than English: NLP Datasets have a Language Overfitting Problem

While there has been marked improvement in pre-training language models, there is a scarcity of huge amounts of unlabelled data for many non-English languages.

13/09/2022

AI Mysteries

Now Hugging Face Gives Away 650 NLP Datasets For Free

With Datasets, Hugging Face wants to standardise end-user interface, versioning, and documentation, and provide a lightweight frontend for internet-scale corpora.

16/09/2021

AI Mysteries

A Comprehensive Guide To 15 Most Important NLP Datasets

If you are just getting started with NLP or a researcher who is really into Natural language processing this comprehensive guide will help you with

05/01/2021

AI Mysteries

Datasets: A Community Library for NLP by Hugging Face

The size, variety, and number of publicly available NLP (Natural Language Processing) datasets have grown rapidly as researchers propose new goals, larger models, and unique benchmarks.

21/11/2021

AI Origins & Evolution

Google Introduces Two New Datasets For Improved Conversational NLP

Google’s published study investigates pre-trained language models for their temporal reasoning capabilities in dialogs using TimeDial and Disfl-QA.

14/08/2021

AI Origins & Evolution

Explained: CUAD, The Dataset For Legal NLP

NLP is still largely unexplored when it comes to complicated language such as legal contracts. Recently, the researchers at Berkeley and Nueva School, have taken

01/04/2021

AI Mysteries

Most Benchmarked Datasets for Question Answering in NLP with implementation in PyTorch, Keras, and TensorFlow

Question Answering is a technique inside the fields of natural language processing, which is concerned about building frameworks that consequently answer addresses presented by people in a natural language processing.

24/11/2020

AI Mysteries

Deep Dive in Datasets for Machine translation in NLP Using TensorFlow and PyTorch

With the advancement of machine translation, there is a recent movement towards large-scale empirical techniques that have prompted exceptionally massive enhancements in translation quality. Machine Translation is the technique of consequently changing over one characteristic language into another, saving the importance of the info text.

21/11/2020

AI Mysteries

Datasets for Language Modelling in NLP using TensorFlow and PyTorch

In recent times, Language Modelling has gained momentum in the field of Natural Language Processing. So, it is essential for us to think of new models and strategies for quicker and better preparation of language models. Nonetheless, because of the complexity of language, we have to deal with some of the problems in the dataset. With an increase in the size of the dataset, there is an increase in the normal number of times a word shows up in that dataset.

19/11/2020

AI Mysteries

Complete Guide On NLP Profiler: Python Tool For Profiling of Textual Dataset

NLP Profiler is a simple NLP library which works on profiling of textual datasets with one one more text columns.

09/09/2020

AI Mysteries

Top NLP Libraries & Datasets For Indian Languages

Natural language processing has the potential to broaden the online access for Indian citizens due to significant advancements in high computing GPU machines, high-speed internet

07/02/2020

AI Mysteries

10 NLP Open-Source Datasets To Start Your First NLP Project

There has been significant growth in natural language processing (NLP) over the last few years. The demand for advanced text recognition, sentiment analysis, speech recognition,

16/09/2019

Telugu LLM Labs is Here with Native and Romanised Dataset

AI News & Update

Telugu LLM Labs is Here with Native and Romanised Dataset

The dataset includes a romanised pre-training dataset and a supervised fine-tuning dataset in native and romanised scripts in Telugu.

31/01/2024

AI Mysteries

10 Brilliant Datasets Based on ChatGPT Outputs

Since the release of GPT-4, AI researchers have been using the model’s outputs to train their own language models and datasets for benchmark results.

25/07/2023

Innovation in AI

The Relevance of NLP Engineers in a ChatGPT-Crazy World

In the current times of LLMs-gone-wild, NLP engineers are in the thick of things and will presumably be one of the first ones to see the repercussions of this shift.

12/04/2023

AI Origins & Evolution

The Judgment Is Out For NLP

Law firms lose up to 40% value on a deal due to inefficiency in contracting

06/01/2023

AI Origins & Evolution

Top 12 Datasets and Projects Open Sourced in 2022

Open sourcing of these projects and datasets has driven innovation and development from the developer community.

03/01/2023

Intellectual AI Discussion

[Exclusive] Indian Researcher Solves a 2,500 Years Old Sanskrit Problem for NLP

Teaching NLP models how to combine a speaker’s intention with Panini’s rule-based grammar would be a milestone for producing human speech.

16/12/2022

AI News & Update

Meta’s ‘No Language Left Behind’ 450GB training dataset reproduced & released online

Meta and AllenNLP researchers have released mined bitext training data for Meta AI’s No Language Left Behind NLLB-200 models. The company aims to facilitate analysis

25/08/2022

AI Mysteries

How to create and deploy NLP pipelines on comet server with SparkNLP?

This article provides a brief overview of how to integrate a SparkNLP pipeline into a comet server and obtain predictions accordingly and monitor the model in the comet server

15/06/2022

AI Origins & Evolution

NLP gets a quantum boost

QSANN is effective and scalable on larger data sets and can be deployed on near-term quantum devices.

01/06/2022

AI Origins & Evolution

Is NLP innovating faster than other domains of AI

Why is there such intense competition in this field, or in other words, are other AI domains lagging behind NLP in terms of innovation?

16/05/2022

AI Origins & Evolution

Top open source datasets from Amazon

Over the years, Amazon and AWS have contributed massively to the open-source community by releasing their comprehensive datasets to the public.

28/04/2022

AI News & Update

Amazon makes MASSIVE announcements around a 51-language dataset

The MASSIVE dataset and the Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.

22/04/2022

AI Origins & Evolution

Why is Meta AI’s textless NLP a breakthrough?

If fully explored, textless NLP can be an improvement over the usual systems like natural language processing and automatic speech recognition.

04/04/2022

AI Mysteries

A guide to generating realistic synthetic image datasets with Kubric

Kubric is an open-source Python framework that allows you to create photo-realistic scenes by combining the functions of PyBullet and Blender.

17/03/2022

AI Mysteries

A guide to GluonNLP: Deep Learning framework for NLP

GluonNLP is a Natural language processing Deep learning-based toolkit. This toolkit includes cutting-edge pre-trained models, training scripts, and training logs to help with rapid prototyping and reproducible research.

01/03/2022

AI Origins & Evolution

How small datasets drive efficiency in vision models

Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition.

03/02/2022

AI Mysteries

Most Popular NLP Papers Of 2021

Natural Language Processing includes the analysing of data to extract and process meaningful information.

17/12/2021

AI Mysteries

Popular Datasets Released By Tech Firms In 2021

This article lists some of the datasets open-sourced by big tech companies in 2021

02/12/2021

Results

Search Results for: NLP dataset

Contact Us

Subscribe to our newsletter

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

Subscribe to Our Newsletter