UF audio deepfakes study could ultimately help hearing-impaired detect fraud

When University of Florida doctoral student Magdalena Pasternak realized the voice urging her to take a phone survey was not human, she not only hung up but was left with nagging questions, leading her to launch a research project.
“The caller requested a moment of my time but then abruptly decided to send me an email,” she recalled. “It was at that moment I realized I was not speaking to a person, but an artificial voice intentionally designed to obscure its synthetic nature.”
Pasternak, a Ph.D. student in FICS, led a study with other UF researchers examining how audio deepfakes threaten hearing-impaired people who rely on cochlear implants (CIs).
Ultimately, the study indicates the need for enhanced deepfake detection systems in implants for the hearing impaired.
“Deaf and hard-of-hearing populations, especially cochlear implant users, perceive audio in very different ways from hearing persons,” said CISE Professor Kevin Butler, Ph.D., FICS director and lead faculty investigator on the study.
“Alert mechanisms could potentially be built into assistive devices in the future.” Kevin Butler, CISE professor and FICS director
“Conventional wisdom,” he added, “would indicate that CI users in particular would find it challenging-to-impossible to differentiate natural from synthetic speech. Surprisingly, we found that not all deepfakes are the same. Certain types of deepfake attacks are particularly problematic. This is where our focus on detection, particularly for this community, should be placed as we develop and implement solutions. Some of these alert mechanisms could potentially be built into assistive devices in the future.”
Butler contends this study is the first to examine how CI users perceive audio deepfakes and is one of the largest academic studies examining their hearing perceptions based on the number of study participants.
It focused on how CI users respond to audio deepfakes, or artificial audio generated by artificial intelligence, as well as how computer-based audio deepfake detectors performed against audio as perceived by CI users.
The results: Study participants without CIs were able to identify deepfakes with 78% accuracy, while CI users achieved only 67% accuracy. CI users also were twice as likely to misclassify deepfakes as real speech.
Advancements in hearing technology, such as cochlear implants, have significantly improved the accessibility for individuals with hearing loss, enabling them to engage more effectively with audio-based interfaces like smartphones and voice assistants. Unfortunately, new technology brings new risks to vulnerable members of the population.
The study was an interdisciplinary collaboration with Professor Susan Nittrouer, Ph.D., who leads the Speech Development Laboratory in UF’s College of Public Health and Health Professions (PHHP) and is an internationally renowned expert in perceptual processing of speech.
The work was recently published in the paper “Characterizing the Impact of Audio Deepfakes in the Presence of Cochlear Implant Simulated Audio,” which was presented at the Network and Distributed System Security Symposium earlier this year in San Diego.
Pasternak’s research focuses on security in large language models, deepfake detection and machine learning. Her work is supported by the Center for Privacy and Security of Marginalized and Vulnerable Populations (PRISM), a National Science Foundation project directed by Butler that aims to transform how the security community addresses the specific cybersecurity needs of vulnerable populations by developing tools and methods at the core of cybersecurity research and technology design.
Generative audio technologies now can create persuasive, human-sounding audio. While this technology is commonly used with smart-device voice assistants Alexa, Cortana and Siri, it is considered a deepfake when used for malicious purposes.
Deepfakes have been used to breach confidentiality, extort money and spread misinformation. During the 2024 New Hampshire Demographic primary, over 20,000 voters received robocalls impersonating President Joe Biden, telling them not to vote.
CIs restore hearing by converting sound into electrical signals stimulating the auditory nerve. CIs use a limited number of electrode channels, prioritizing speech-relevant frequencies and compressing sound, which reduces nuances in speech aspects like pitch. Therefore, CI users may need to depend on alternative auditory cues.
Pasternak and her research team recruited 87 people with good hearing and 35 CI users from the United Kingdom and the United States to evaluate and compare their deepfake-detection accuracy by answering the following questions: 1) How susceptible are CI users to audio deepfakes? 2) How effective are automated models on CI-simulated audio? 3) Can these models be used as substitutes for CI users?
Researchers used the Automatic Speaker Verification spoofing (ASVspoof) detection database, which incorporates two prevalent deepfake-generation techniques: speech synthesis and voice conversion. Speech synthesis (Text-To-Speech – TTS) generates human-like speech from text, though it may lack natural human speech characteristics. Voice conversion (VC) modifies existing human speech samples to resemble a target voice that preserves natural speech characteristics.
While CI users were able to detect TTS deepfakes, VC deepfakes were a greater challenge. CI users were far more likely to consider them authentic rather than artificial.
Though current deepfake models do not completely capture the specific challenges faced by CI users, improving proxy models could enhance deepfake detection systems.
“Deepfake technology has advanced so rapidly in such a short span of time that many people are simply unaware of how sophisticated modern deepfakes have become,” Pasternak said. “While education and awareness programs can play a huge role in protecting users, we must also develop technological solutions that accommodate diverse auditory processing abilities. By doing so, we can ensure that all users, especially those with auditory implants, have the necessary defenses against ever-evolving threats.”
Tags: Kevin Butler