In July 2019, the Facebook research team introduced the Robustly Optimized BERT Pretraining Approach (RoBERTa)–an improvement over the Bidirectional Encoder Representations from Transformers (BERT), a self-supervised method for NLP tasks released by Facebook in 2018.
Two researchers, Nipun Sadvilkar and Haswanth Aekula, have now pretrained the RoBERTa model on Marathi language using a masked language modelling (MLM) objective in a self-supervised manner. The duo unveiled the model at Hugging Face’s community week.
The model is primarily aimed at tasks that use the whole sentences (potentially masked) to make decisions, such as sequence classification, token classification or question-answer. The duo used this model to fine-tune text classification tasks like iNLTK and indicNLP. Since the Marathi mc4 dataset is made up of text from Marathi newspapers, it might involve biases that can affect all fine-tuned versions of the model, the team warned.