StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin

Neil Shah, Neha Sahipjohn, Vishal Tambrahalli, Ramanathan Subramanian, Vineet Gandhi

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments.

Original languageEnglish
Pages (from-to)1-21
Number of pages21
JournalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume8
Issue number3
DOIs
Publication statusPublished - 9 Sept 2024

Cite this