If you have ever proofread an article for a friend or read research papers for work, you would know the concentration and hard work that goes into ensuring a given written work is original. The publishing industry undergoes this challenge but on a daily basis and with bigger manuscripts. And if plagiarism checks weren’t enough, we now have deepfakes, allowing authors to fabricate images. Doubled edged sword, the same AI allowing the creation of such deepfakes, is now stepping in to help organisations spot duplication.
The year 2021 has witnessed publications like the AACR, Wiley, and Frontiers, leveraging AI on their peer-reviewed manuscripts to identify duplicated images and alert the editors automatically. The softwares can tag images that have been doctored, rotated, flipped, filtered and stretched. These companies are still early adopters of the software, and the software is also in its initial stages; not a close match for professional deepfakes. But, the technique is surely a step in the right direction from manual scanning that is not foolproof.
Nature has identified four publishers that have automated the process to spot duplications before publishing manuscripts in a feature. “Specialists say a wave of automated image-checking assistants could sweep through the scientific publishing industry in the next few years, much as using software to check manuscripts for plagiarism became routine a decade ago,” explained Nature.
Proofig, one of the most used softwares for image detection in publications, extracts images from papers and compares them with the manuscript at hand to identify similar features or duplications. Customers can easily upload their documents on Proofig’s platform to apply for ‘issue detection’, and the software checks a paper in a couple of minutes at most. “proofig issues a full report on each article within minutes”, the company states. Additionally, according to Dror Kolodkin-Gal, the founder, Proofig can correct tricky issues like the compression artefacts that arise when high-resolution raw data are compressed into smaller files.
A similar technology is the AIRA, Artificial Intelligence Review Assistant, developed by Frontiers Publishers. For a year, the software has been used by their internal research integrity team to run image checks on all submitted manuscripts. Unfortunately, according to a spokesperson for Nature, most flagged papers are incorrectly tagged, and editors actually follow up on only 10 per cent of the tags. Proofig is also notorious for flagging out many errors that editors or previous manual staff members did, but publishers are still hopeful of the technology’s future applications.
The technology will still require editors to oversee the flags since there may be instances where an image is deliberately repeated over the manuscript or be genuine copy-paste errors – only the author and editor can decide what mistakes are real. The real point of concern with the technology is that in the longer run, fraudsters could use AI to dupe the AI checking software in the publishing industry as it has done for deepfakes. “These are extremely difficult to detect because of the precision of tampered aspects such as the skin tone, background merging, and audio features to match the original content. Having said that, the technology is still developing and tends to create loopholes in some features in the image/audio/video; humans can detect that,” explains Aishwarya Srinivasan, AI innovation leader at IBM.
Experts are also concerned about a paralysing reliance on software screening without automated checks to ensure no false positives or missed manipulations. This resulted in several publishers forming a board to set standards for such softwares that claim to screen manuscripts for image repetition and how editors should respond to flagging of doctored images.
The image duplication detection technology is still nascent in the publishing industry. However, given the rich history of AI in publication, leaders are still hopeful of the softwares evolving with time. Previously, AI has been used to identify future research trends, discover peer reviewers, check text for plagiarism, and mitigate flawed reporting and data usage conflicts.