Abstract
A system capable of interpreting affect from a speaking face
must recognise and fuse signals from multiple cues. Building
such a system requires the integration of software components
to perform tasks such as image registration, video segmentation, speech recognition and classification. Such software components tend to be idiosyncratic, purpose-built, and driven by
scripts and textual configuration files. Integrating components
to achieve the necessary degree of flexibility to perform full
multimodal affective recognition is challenging. We discuss the
key requirements and describe a system to perform multimodal
affect sensing which integrates such software components and
meets these requirements.
must recognise and fuse signals from multiple cues. Building
such a system requires the integration of software components
to perform tasks such as image registration, video segmentation, speech recognition and classification. Such software components tend to be idiosyncratic, purpose-built, and driven by
scripts and textual configuration files. Integrating components
to achieve the necessary degree of flexibility to perform full
multimodal affective recognition is challenging. We discuss the
key requirements and describe a system to perform multimodal
affect sensing which integrates such software components and
meets these requirements.
Original language | English |
---|---|
Title of host publication | Invalid Code |
Place of Publication | Germany |
Publisher | Springer |
Pages | 104-115 |
Number of pages | 12 |
Volume | 4868 |
ISBN (Print) | 0302-9743 |
Publication status | Published - 2008 |
Event | 9th Annual Conference of the International Speech Communication Association - Brisbane, Australia Duration: 22 Sept 2008 → 26 Sept 2008 |
Conference
Conference | 9th Annual Conference of the International Speech Communication Association |
---|---|
Country/Territory | Australia |
City | Brisbane |
Period | 22/09/08 → 26/09/08 |