Abstract
One of the advantages of multimodal HCI technology
is the performance improvement that can be gained
over conventional single-modality technology by employing complementary sensors in different modalities. Such information is particular useful in practical, real-world applications where the application’s
performance must be robust against all kinds of noise.
An example is the domain of automatic speech recognition (ASR). Traditionally, ASR systems only use information from the audio modality. In the presence of
acoustic noise, the performance drops quickly. However, it can and has been shown that incorporating
additional visual speech information from the video
modality improves the performance significantly, so
that AV ASR systems can be employed in applications areas where audio-only ASR systems would fail,
thus opening new application areas for ASR technology. In this paper, a non-intrusive (no artificial markers), real-time 3D lip tracking system is presented as
well as its application to AV ASR. The multivariate
statistical analysis ‘co-inertia analysis’ is also shown,
which offers improved numerical stability over other
multivariate analyses even for small sample sizes.
is the performance improvement that can be gained
over conventional single-modality technology by employing complementary sensors in different modalities. Such information is particular useful in practical, real-world applications where the application’s
performance must be robust against all kinds of noise.
An example is the domain of automatic speech recognition (ASR). Traditionally, ASR systems only use information from the audio modality. In the presence of
acoustic noise, the performance drops quickly. However, it can and has been shown that incorporating
additional visual speech information from the video
modality improves the performance significantly, so
that AV ASR systems can be employed in applications areas where audio-only ASR systems would fail,
thus opening new application areas for ASR technology. In this paper, a non-intrusive (no artificial markers), real-time 3D lip tracking system is presented as
well as its application to AV ASR. The multivariate
statistical analysis ‘co-inertia analysis’ is also shown,
which offers improved numerical stability over other
multivariate analyses even for small sample sizes.
Original language | English |
---|---|
Title of host publication | Proceedings of the NICTA-HCSNet Multimodal User Interaction Workshop |
Editors | Fang Chen, Julian Epps |
Place of Publication | Australia |
Publisher | Australian Computer Society |
Pages | 25-32 |
Number of pages | 8 |
ISBN (Print) | 1445-1336 |
Publication status | Published - 2005 |
Event | MMUI2005 - Sydney, Australia Duration: 13 Sept 2005 → 14 Sept 2005 |
Conference
Conference | MMUI2005 |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 13/09/05 → 14/09/05 |