Hail Dr Fill, The New AI-based Crossword King That Beat Humans

Share

Published on June 22, 2021

by Meenal Sharma

Five million people from across the world solve crossword puzzles every day. Out of this, globally, Dr Fill is one of the 50 best crossword players. But what is surprising is that Dr Fill is not a middle-aged American man. He is but a computer programme, an artificial intelligent-based algorithm. A year back, Dr Fill was considered a primitive programme that could not solve complex puzzles. Today, after winning the 2021 American Crossword puzzle tournament, he is compared to the very best human puzzle-solvers.

Dr Fill was developed by Mathew Ginsberg, in collaboration with the Berkley NLP group consisting of undergraduates and graduates, and overseen by UC Berkeley Professor Dan Klein. The team completed the final model of Dr Fill just two weeks before the tournament started.

Mathew had developed the primitive model nearly a decade ago in an attempt to create a programme that, unlike him, could outmatch human crossword players. A decade later, Mathew collaborated with UC Berkeley to make it a next-generation technology. He did so by embedding Berkeley’s neural net methods to interpret clues that would align with the programme’s Ginsberg framework to fill out the crossword puzzle efficiently.

The challenge

Developing a crossword puzzle is one of the most challenging tasks in machine learning. A developer must create an algorithm that navigates through data as the human mind does with, that is, with multi-hop inference. Besides this, the biggest challenge was developing a program that couldn’t just be based on knowledge. American crosswords involve a degree of lateral thinking. Therefore, the programme had to understand that a question mark would signal some semantic shenanigan. It also had to understand the linguistic challenges, which could mean a thousand different things. The conclusion, thus, was that the algorithm could not look for straightforward answers, ones that would be evident.

Additionally, termination of the program was also challenging. Still, the developers decided to counter it by coding the algorithm to terminate if a full minute passed.

Dr Fill grants the puzzle as complete and terminates it after it fulfils either of the following criteria:

– When after scanning, the algorithm cannot make any improvement in the puzzle

– If one LDS iteration( cycle of the algorithm) runs to completion with no change in the puzzle

– When the tournament time limit is reached

How Dr Fill outsmarted humans?

Dr Fill has a profoundly methodical approach to crosswords. While training, the algorithm worked on crosswords seven days a week.

Solving the New York Times crossword puzzle (New York Times’ puzzles’ difficulty level increases across the week), Dr Fill solved puzzles of the first three days ‘fairly easily’. It also did well on Fridays and Saturdays but struggled on Thursdays and Sundays.

Dr Fill has been trained on over 47,000 puzzles that have appeared in various newspapers and magazines. Similar to humans, the program depends on its training in the past to seek out connections between the old and new. It generates hundreds of possible solutions that would best match the clues and ranks them in the likelihood of their match in the puzzle.

The tournament is a closed system, meaning the programme cannot ‘Google’ the gaps in its knowledge. Thus, it was trained to mimic human’s imperfect capabilities and storage and work with that.

The programme works by converting crosswords to weighted Constraints Satisfaction Problems(CSPs) and then uses the technique to figure out a solution. Ginsberg concluded that the most effective technique was a modification of the Limited Discrepancy Search(LDS). He also figured out that the commonly used Branch & Bound appears not to be an effective solution for a problem of this kind.

Dr Fill was trained on languages to understand the linguistic inferences that are now being acknowledged as a step forward in Natural Language Processing (NLP) and a milestone in Machine Learning.

The performance in ACPT

Dr Fill’s performance was comparable to the very best human crossword solvers. Though the programme lagged in a few areas with a human edge, it was outmatched in others. It solved puzzles in a single minute, two minutes faster than humans. Though human-solved crosswords were error-free, Dr Fill was not perfect on all puzzles. It was waylaid on two and finished with errors. However, despite its lack, it won because other solvers did not even closely match its speed.

New York Times Editor Will Shortz thinks that this year’s puzzles were about the strengths of Dr Fill. Although in awe of the ingenuity and marvelling at the programme, he still thinks humans have the edge over machines to solve “Messy, non-logical, and real-world problems,” like crosswords.

The success of Dr Fill is a milestone as it will help in the application of artificial intelligence in the development of programs that can help solve real-world problems.

Access all our open Survey & Awards Nomination forms in one place