This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first published AV speaking-face data corpus for Australian English and is novel in its use of a stereo camera system for the video recordings and its modular design.
|Title of host publication||INTERSPEECH 2004 - ICSLP: 8th International Conference on Spoken Language Processing|
|Editors||S.H Kim, D.H Youn|
|Place of Publication||Canada|
|Number of pages||4|
|Publication status||Published - 2004|
|Event||INTERSPEECH 2004 - ICSLP 8th International Conference on Spoken Language Processing - Jeju, Korea, Republic of|
Duration: 3 Oct 2004 → 7 Oct 2004
|Conference||INTERSPEECH 2004 - ICSLP 8th International Conference on Spoken Language Processing|
|Country||Korea, Republic of|
|Period||3/10/04 → 7/10/04|
Goecke, R., & Millar, J. (2004). The Audio-Video Australian English Speech Data Corpus AVOZES. In S. H. Kim, & D. H. Youn (Eds.), INTERSPEECH 2004 - ICSLP: 8th International Conference on Spoken Language Processing (pp. 2525-2528). Canada: ISCA.