In July 2019, the Facebook research team introduced the Robustly Optimized BERT Pretraining Approach (RoBERTa)–an improvement over the Bidirectional Encoder Representations from Transformers (BERT), a self-supervised method for NLP tasks released by Facebook in 2018.
Two researchers, Nipun Sadvilkar and Haswanth Aekula, have now pretrained the RoBERTa model on Marathi language using a masked language modelling (MLM) objective in a self-supervised manner. The duo unveiled the model at Hugging Face’s community week.
The model is primarily aimed at tasks that use the whole sentences (potentially masked) to make decisions, such as sequence classification, token classification or question-answer. The duo used this model to fine-tune text classification tasks like iNLTK and indicNLP. Since the Marathi mc4 dataset is made up of text from Marathi newspapers, it might involve biases that can affect all fine-tuned versions of the model, the team warned.
Join Our Telegram Group. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
I am a journalist with a postgraduate degree in computer network engineering. When not reading or writing, one can find me doodling away to my heart’s content.