In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speakerspecific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.
|Title of host publication||INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.|
|Editors||Piero Cosi, Renato De Mori, Giuseppe Di Fabbrizio, Roberto Pieraccini|
|Place of Publication||Florence, Italy|
|Publisher||International Speech Communication Association|
|Number of pages||4|
|Publication status||Published - 27 Aug 2011|
|Event||INTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association - Florence, Florence, Italy|
Duration: 27 Aug 2011 → 31 Aug 2011
|Conference||INTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association|
|Period||27/08/11 → 31/08/11|
Cummins, N., Epps, J., Breakspear, M., & Goecke, R. (2011). An Investigation of Depressed Speech Detection: Features and Normalization. In P. Cosi, R. D. Mori, G. D. Fabbrizio, & R. Pieraccini (Eds.), INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc. (pp. 2997-3000). Florence, Italy: International Speech Communication Association.