Listen to this story
|
When Faridabad resident Karan received a call from one of his friends who had just met with an accident asking him to transfer Rs 30,000 for treatment, he had little reasons to raise a doubt. The man calling Karan sounded exactly like his friend and said he was using someone else’s phone as his phone got damaged in the accident.
Karan frantically transferred the money. Later, when he contacted his friend, he realised that he had been a victim of a fraud AI voice call. He filed a complaint with the NIT Cyber police station, which revealed that a fraudster used an AI voice impersonator to fake his friend’s voice and duped him of his money. Such cases are rampant across the country.
Criminals have exploited deepfake technology to deceive individuals through fake calls and videos. For instance, a man in Kerala fell victim to a deepfake call from a friend claiming a medical emergency, resulting in a loss of INR 40,000. The growing accessibility and advancement of AI have once again transformed the nature of cyber crimes. This new industry seems to be mushrooming fast, going from strength to strength with WormGPT and FraudGPT to now impersonation scams.
A McAfee survey found that 25% of adults worldwide have fallen prey to AI voice scams of some kind. India tops the list with an astounding number of incidents at 47%, followed by the United States at 14%, and the UK at 8%. These frauds are carried out by extracting your voice samples from social media websites like Instagram, Facebook, Twitter etc. As little as 3 seconds of your voice can be used to clone it using the voice-cloning technology.
Big Techs’ GenAI Poses New Challenges
Microsoft recently introduced a groundbreaking text-to-speech AI model called VALL-E. In a paper published this month, the company unveiled that VALL-E can replicate a person’s voice using just a brief 3-second recording. Impressively, preliminary findings indicate that VALL-E can even capture and reproduce the emotional nuances of the speaker.
VALL-E is trained on a dataset comprising 60,000 hours of English speech data. This dataset is asserted to be “hundreds of times larger than existing systems”, and is significantly better than the existing models in the realm of AI-driven voice synthesis.
So, a three-second recording of your voice, paired with something like Eleven Labs’ multilingual v2—a foundational AI Model that can be used for nearly 30 languages, definitely spells trouble. Additionally, Meta’s SeamlessM4T is capable of translation into 100 languages.
While naturally, some are excited about the doors that these AI tools could open in marketing, customer service, e-learning and entertainment, others are wary of what it could entail—an industry of AI-enabled criminals using it for all kinds of crimes—a new coming of Jamtara?
Cybercriminals are also using cloning tools like HeyGen, Murf, Resemble AI, Lyrebird, and ReadSpeaker to create perfect voice clones. To add to it, these tools are inexpensive, costing as little as $0.6. These easily accessible and cheap AI voice generators are enabled by numerous tutorials available online. The ease of access to Generative AI models has allowed individuals with limited technical knowledge to carry out tasks once beyond their capabilities. The tutorials make it easy for inexperienced and tech-oblivious individuals with ill intent to carry out scams at scale.
Diamond Cut Diamond
While these scamsters are using AI-enabled voice generators, law enforcement is also wielding similar weaponry against them. The cyber police have been using AI tools to monitor SIM cards that are engaged in such scams and recently blocked upwards of 14k SIMs in Haryana’s Mewat district.
The Indian Department of Telecommunications is also employing an AI-based facial recognition tool called ASTR to combat fraudulent SIM card use. It encodes human faces in subscriber images using convolutional neural networks to account for various factors like face angle and image quality. ASTR conducts face comparisons, grouping similar faces, and identifies identical faces with at least 97.5% accuracy.
ASTR is capable of detecting all SIMs associated with a suspected face in less than 10 seconds from a database of one crore images. Additionally, it employs “fuzzy logic” to find approximate matches for subscriber names, accommodating typographical errors. The tool helps identify individuals with multiple connections or SIMs obtained under different names using the same photograph. The list is also shared with banks, payment wallets, and social media platforms to disconnect these numbers. WhatsApp collaborated with the government to disable fraudulent accounts, with ongoing efforts across other social media platforms.
Meanwhile, it’s also crucial to stay alert and adopt proactive measures at your end. Users can verify the caller’s identity, employ codewords or pose a question that only their friend would answer correctly, to safeguard themselves if they’re ever in a sticky situation like Karan.