Meet The New Marathi RoBERTa

The duo unveiled the model at Hugging Face’s community week.
Marathi RoBERTa


In July 2019, the Facebook research team introduced the Robustly Optimized BERT Pretraining Approach (RoBERTa)–an improvement over the Bidirectional Encoder Representations from Transformers (BERT), a self-supervised method for NLP tasks released by Facebook in 2018.

Two researchers, Nipun Sadvilkar and Haswanth Aekula, have now pretrained the RoBERTa model on Marathi language using a masked language modelling (MLM) objective in a self-supervised manner. The duo unveiled the model at Hugging Face’s community week.


Sign up for your weekly dose of what's up in emerging technology.

The model is primarily aimed at tasks that use the whole sentences (potentially masked) to make decisions, such as sequence classification, token classification or question-answer. The duo used this model to fine-tune text classification tasks like iNLTK and indicNLP. Since the Marathi mc4 dataset is made up of text from Marathi newspapers, it might involve biases that can affect all fine-tuned versions of the model, the team warned.

Credit: GitHub

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM