An Investigation of Depressed Speech Detection: Features and Normalization

Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke

Research output: A Conference proceeding or a Chapter in BookConference contribution

56 Citations (Scopus)

Abstract

In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speakerspecific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.
Original languageEnglish
Title of host publicationINTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.
EditorsPiero Cosi, Renato De Mori, Giuseppe Di Fabbrizio, Roberto Pieraccini
Place of PublicationFlorence, Italy
PublisherInternational Speech Communication Association
Pages2997-3000
Number of pages4
ISBN (Print)9781618392701
Publication statusPublished - 27 Aug 2011
EventINTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association - Florence, Florence, Italy
Duration: 27 Aug 201131 Aug 2011

Conference

ConferenceINTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association
CountryItaly
CityFlorence
Period27/08/1131/08/11

Fingerprint

Classifiers

Cite this

Cummins, N., Epps, J., Breakspear, M., & Goecke, R. (2011). An Investigation of Depressed Speech Detection: Features and Normalization. In P. Cosi, R. D. Mori, G. D. Fabbrizio, & R. Pieraccini (Eds.), INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc. (pp. 2997-3000). Florence, Italy: International Speech Communication Association.
Cummins, Nicholas ; Epps, Julien ; Breakspear, Michael ; Goecke, Roland. / An Investigation of Depressed Speech Detection: Features and Normalization. INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.. editor / Piero Cosi ; Renato De Mori ; Giuseppe Di Fabbrizio ; Roberto Pieraccini. Florence, Italy : International Speech Communication Association, 2011. pp. 2997-3000
@inproceedings{885bd6f921e4440d8e9487dc0c9aab6d,
title = "An Investigation of Depressed Speech Detection: Features and Normalization",
abstract = "In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speakerspecific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80{\%} for this database.",
keywords = "affective state recognition, depressed speech, feature comparison",
author = "Nicholas Cummins and Julien Epps and Michael Breakspear and Roland Goecke",
year = "2011",
month = "8",
day = "27",
language = "English",
isbn = "9781618392701",
pages = "2997--3000",
editor = "Piero Cosi and Mori, {Renato De} and Fabbrizio, {Giuseppe Di} and Roberto Pieraccini",
booktitle = "INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.",
publisher = "International Speech Communication Association",

}

Cummins, N, Epps, J, Breakspear, M & Goecke, R 2011, An Investigation of Depressed Speech Detection: Features and Normalization. in P Cosi, RD Mori, GD Fabbrizio & R Pieraccini (eds), INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.. International Speech Communication Association, Florence, Italy, pp. 2997-3000, INTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27/08/11.

An Investigation of Depressed Speech Detection: Features and Normalization. / Cummins, Nicholas; Epps, Julien; Breakspear, Michael; Goecke, Roland.

INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.. ed. / Piero Cosi; Renato De Mori; Giuseppe Di Fabbrizio; Roberto Pieraccini. Florence, Italy : International Speech Communication Association, 2011. p. 2997-3000.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - An Investigation of Depressed Speech Detection: Features and Normalization

AU - Cummins, Nicholas

AU - Epps, Julien

AU - Breakspear, Michael

AU - Goecke, Roland

PY - 2011/8/27

Y1 - 2011/8/27

N2 - In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speakerspecific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.

AB - In recent years, the problem of automatic detection of mental illness from the speech signal has gained some initial interest, however questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speakerspecific nature of mental disorders. In this paper, these questions are addressed empirically using classifier configurations employed in emotion recognition from speech, evaluated on a 47-speaker depressed/neutral read sentence speech database. Results demonstrate that (1) detailed spectral features are well suited to the task, (2) speaker normalization provides benefits mainly for less detailed features, and (3) dynamic information appears to provide little benefit. Classification accuracy using a combination of MFCC and formant based features approached 80% for this database.

KW - affective state recognition

KW - depressed speech

KW - feature comparison

M3 - Conference contribution

SN - 9781618392701

SP - 2997

EP - 3000

BT - INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.

A2 - Cosi, Piero

A2 - Mori, Renato De

A2 - Fabbrizio, Giuseppe Di

A2 - Pieraccini, Roberto

PB - International Speech Communication Association

CY - Florence, Italy

ER -

Cummins N, Epps J, Breakspear M, Goecke R. An Investigation of Depressed Speech Detection: Features and Normalization. In Cosi P, Mori RD, Fabbrizio GD, Pieraccini R, editors, INTERSPEECH 2011 12th Annual Conference of the International Speech Comm. Assoc.. Florence, Italy: International Speech Communication Association. 2011. p. 2997-3000