[Exclusive] Indian Researcher Solves a 2,500 Years Old Sanskrit Problem for NLP

Teaching NLP models how to combine a speaker's intention with Panini’s rule-based grammar would be a milestone for producing human speech.

Share

Published on December 16, 2022

by Mohit Pandey

Listen to this story

Rishi Rajpopat, Ph.D. student at University of Cambridge, made a major breakthrough to solve a 2,500 years old problem that had continued to baffle Sanskrit scholars. Decoding a rule taught by the father of linguistics, Pāṇini, the discovery now makes it possible to derive Sanskrit words, that includes creation of “mantras” and “guru” using the language machine.

Can Big Data Analytics come to rescue in these turbulent times? Register for the Webinar >>

Rajpopat’s thesis, titled, ‘In Pāṇini, We Trust: Discovering the Algorithm for Rule Conflict Resolution in the Astadhyayi‘, solves the issue of conflicting rules in Panini’s system, called Aṣṭādhyāyī, where Sanskrit researchers could not predict the grammatically correct results.

In an interaction with Analytics India Magazine, Rajpopat said that he switched from Economics to Pāṇini grammar for his postgraduate degree due to his interest in Sanskrit since school. “It’s just that the first commentator, who wrote around 150 years after Pāṇini, got it wrong and then after everyone else ended up with the wrong interpretation up to before I wrote my thesis,” he explained.

Further, Rishi Rajpopat said that there are 4000 rules in Pāṇini’s grammar using which one can derive any word and subsequently any sentence of the language. “What happens though very often is that at certain depths of the derivation become simultaneously applicable. It was hard for me to believe that a genius would overlook something so central to the functioning of the system. It didn’t make sense to me,” he added, further explaining that in the context of adding extra things to a system which he had written because if Pāṇini wanted to add extra things, he would have.

“So, instead of saying the machine doesn’t work, I decided to look at what our understanding of the machine is and tried to see if there are any errors in that. And that’s precisely what I did. I reinterpreted rule 142 (out of the 4000 rules),” said Rajpopat.

Teaching natural language processing models how to combine a speaker’s intention with Pāṇini’s rule-based grammar would be a milestone for producing human speech, he concluded.

What is the Pāṇini System?

The Pāṇini system consists of 4,000 rules written in 500 BC that work like a large language model to generate grammatically correct sentences through a step-by-step process. This requires an algorithm. The “rule conflicts” in the Pāṇini machine is where two or more rules are applicable at the same step. Pāṇini developed several meta-rules to solve this but Sanskrit scholars could not interpret and decide which rule to use for the correct output.

Then the scholars developed several other meta-rules, but Rajpopat showed that it is unnecessary and inefficient as the Pāṇini’s “language machine” is itself sufficient. He found that Pāṇini’s “language machine” could produce grammatically correct words and sentences with almost no exceptions.

Rajpopat’s PhD supervisor, Professor Vincenzo Vergiani, said that the discovery is revolutionising as the study of Sanskrit is a rising interest across the globe. He said that a potential practical use case of the research would be to teach human speech to a computer along with his colleagues from data science and computational linguistics.

Access all our open Survey & Awards Nomination forms in one place

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.