TY - JOUR
T1 - Automatic Depression Classification Based on Affective Read Sentences: Opportunities for Text-Dependent Analysis
AU - Stasak, Brian
AU - Epps, Julien
AU - Goecke, Roland
N1 - Funding Information:
The work of Brian Stasak and Julien Epps was partly supported by ARC Discovery Project DP130101094 led by Roland Goecke and partly supported by ARC Linkage Project LP160101360, Data61-CSIRO. The Black Dog Institute (Sydney, Australia) provided the clinical depression speaker database.
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/12
Y1 - 2019/12
N2 - In the future, automatic speech-based analysis of mental health could become widely available to help augment conventional healthcare evaluation methods. For speech-based patient evaluations of this kind, protocol design is a key consideration. Read speech provides an advantage over other verbal modes (e.g. automatic, spontaneous) by providing a clinically stable and repeatable protocol. Further, text-dependent speech helps to reduce phonetic variability and delivers controllable linguistic/affective stimuli, therefore allowing more precise analysis of recorded stimuli deviations. The purpose of this study is to investigate speech disfluency behaviors in nondepressed/depressed speakers using read aloud text containing constrained affective-linguistic criteria. Herein, using the Black Dog Institute Affective Sentences (BDAS) corpus, analysis demonstrates statistically significant feature differences in speech disfluencies, whereby when compared to non-depressed speakers, depressed speakers show relatively higher recorded frequencies of hesitations (55% increase) and speech errors (71% increase). Our study examines both manually and automatically labeled speech disfluency features, demonstrating that detailed disfluency analysis leads to considerable gains, of up to 100% in absolute depression classification accuracy, especially with affective considerations, when compared with the affect-agnostic acoustic baseline (65%).
AB - In the future, automatic speech-based analysis of mental health could become widely available to help augment conventional healthcare evaluation methods. For speech-based patient evaluations of this kind, protocol design is a key consideration. Read speech provides an advantage over other verbal modes (e.g. automatic, spontaneous) by providing a clinically stable and repeatable protocol. Further, text-dependent speech helps to reduce phonetic variability and delivers controllable linguistic/affective stimuli, therefore allowing more precise analysis of recorded stimuli deviations. The purpose of this study is to investigate speech disfluency behaviors in nondepressed/depressed speakers using read aloud text containing constrained affective-linguistic criteria. Herein, using the Black Dog Institute Affective Sentences (BDAS) corpus, analysis demonstrates statistically significant feature differences in speech disfluencies, whereby when compared to non-depressed speakers, depressed speakers show relatively higher recorded frequencies of hesitations (55% increase) and speech errors (71% increase). Our study examines both manually and automatically labeled speech disfluency features, demonstrating that detailed disfluency analysis leads to considerable gains, of up to 100% in absolute depression classification accuracy, especially with affective considerations, when compared with the affect-agnostic acoustic baseline (65%).
KW - Digital phenotyping
KW - Digital medicine
KW - Paralinguistics
KW - Machine Learning
KW - Speech Elicitation
KW - Valence
KW - Speech elicitation
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85073675194&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/automatic-depression-classification-based-affective-read-sentences-opportunities-textdependent-analy-1
U2 - 10.1016/j.specom.2019.10.003
DO - 10.1016/j.specom.2019.10.003
M3 - Article
SN - 0167-6393
VL - 115
SP - 1
EP - 14
JO - Speech Communication
JF - Speech Communication
ER -