A two member research team from the Indian Institute of Science has developed a method to detect whispered speech even in a noisy recording. With further research, this can be incorporated into a tool to reconstruct normal speech, which can be useful for people with laryngeal (voice box) cancer.
Voice activity detection is applied in a variety of speech communications systems including speech recognition, audio conferencing, and echo cancellation. It’s even used to increase mobile phone longevity, by reducing the power lost by transmitting noise when no one is speaking. Until now, voice activity detection was mostly limited to normally voiced speech, and whispers would be ignored as a component of the background noise.
The new study by G. Nisha Meenakshi and Prasanta Kumar Ghosh from Department of Electrical Engineering, IISc has found a novel method to overcome this problem, and detect whispered human speech even in a noisy recording.
Detecting whispered speech is more difficult than detecting voiced, or normal speech, because whispered speech lacks several easily identifiable characteristics. Nisha explains their differences, “The primary difference is that whispered speech lacks pitch, unlike normal speech. This is because there is no vibration of the vocal folds while whispering. This makes the whispered speech more noise-like compared to natural speech. Apart from this, the shape of the frequency spectrum of whispered speech is also different from that of the normal speech.”
The differences in the nature of these two types of speech necessitate a novel approach to the problem. The IISc scientists overcame this by identifying a new characteristic that differentiates whispers from noise. This characteristic, called the 'longterm logarithmic energy variation' of the signal, allows them to detect whispered speech, as the variation of energy in whispered speech is different compared to that in the noise.
With further research, one can build tools to reconstruct normal speech from whispered speech. Such a tool would be immensely helpful for people who are being treated for laryngeal cancer.
“Many cases of laryngeal cancer require a surgical removal of the larynx [voice box], a procedure commonly called laryngectomy. Therefore, the speech from a laryngectomy patient is typically whispered. A device which reconstructs normal speech from whispered speech could render a voice to these patients”, says Nisha.
Building on this new technique, Nisha Meenakshi and Prasanta Kumar Ghosh now want to focus on a tool that will differentiate a whispered speech from a normal speech in an audio sample.
About the authors
Nisha Meenakshi and Prasanta Kumar Ghosh are with the Department of Electrical Engineering, Indian Institute of Science (IISc), Bangalore-560012, India.
Contact: +91 80 2293 2694