Chaos theory finds its voice

Talk may be cheap, but it is still rich with data, and George Mason's Anomadarshi Barua is using artificial intelligence (AI) and a new approach to chaos theory to rebuild missing pieces of speech with incredible accuracy.

"When we talk, our voices have a frequency component from zero kilohertz (kHz) to around 8 kHz," said Barua, an assistant professor in the Department of Cyber Security Engineering. "Capturing all those frequencies normally requires devices to collect and process large amounts of data, which consumes power, storage, and computing resources."

He will present a paper he coauthored on this topic at the Association for Computational Linguistics (ACL) annual meeting later this summer. ACL is one of the world's leading conferences in natural language processing and speech technologies, and the paper was accepted in the top 15 percent of accepted submissions at a conference with a 19 percent acceptance rate overall.

Smaller devices and sensors operating with limited battery life or bandwidth have difficulty capturing an entire signal, and so record a smaller slice, attempting to reconstruct the missing frequencies later. Barua says this is known as bandwidth extension or bandwidth reconstruction.

Imagine listening to a song through a wall and trying to mentally fill in the muffled higher notes. Barua's research teaches AI systems to do something similar, but with far greater mathematical precision. The team's major breakthrough came from incorporating a concept not often associated with speech: chaos theory.

"Our speech is actually a chaotic signal," Barua said. "'Chaos' does not mean completely random."

Instead, he describes speech as "deterministically random," meaning sounds and phonemes are strongly connected to one another in predictable ways. "You can actually determine what could be the next phoneme if you have the sufficient information from the previous phoneme," he said.

To capture those hidden relationships, the researchers developed what Barua calls a "chaotic discriminator" inside a machine-learning framework known as a generative adversarial network, or GAN. In simple terms, one part of the AI system generates reconstructed speech while another checks whether the recreated speech preserves the natural chaotic patterns found in human voices.

The approach significantly improved reconstruction quality compared to previous methods. "We are getting more improved results compared to the previous baseline," Barua said, "because of the incorporation of the chaotic properties."

By integrating chaos-informed modeling, the team dramatically reduced the size of the AI system needed for reconstruction. "We actually reduce the size of the discriminator by 14 times," Barua said. Smaller models require less memory and computing power, making them more practical for real-world devices.

For Barua, whose earlier work focused primarily on cybersecurity before expanding into speech and natural language processing, the ACL conference's acceptance marks an important milestone. "This is my first paper in natural language processing," he said.

While the current paper focuses on speech, Barua sees much broader possibilities ahead. The same reconstruction techniques could eventually be applied to electrical signals, sonar, lidar, and other sensing technologies. "This concept is not only limited to speech," he said. "We are trying to open up a larger branch in different signal modality."

And that message comes through loud and clear.

In This Story

Anomadarshi Barua

Explore Cybersecurity at George Mason

Powering Artificial Intelligence

Topics

Cybersecurity

Department of Cyber Security Engineering

natural language processing

large language models

Artificial Intelligence

Research

College of Engineering and Computing

C-TASC

Chaos theory finds its voice

In This Story

Related Stories

Topics