Multimodal Depression Detection

Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors

Sharifa Alghowinem, Roland GOECKE, Michael WAGNER, Julien Epps, Matthew Hyett, Gordon Parker, Michael Breakspear

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-Term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-Tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.

Original languageEnglish
Article number7763752
Pages (from-to)478-490
Number of pages13
JournalIEEE Transactions on Affective Computing
Volume9
Issue number4
Early online date1 Dec 2016
DOIs
Publication statusPublished - 1 Oct 2018

Fingerprint

Fusion reactions
Feature extraction
Support vector machines
Monitoring
Experiments

Cite this

Alghowinem, Sharifa ; GOECKE, Roland ; WAGNER, Michael ; Epps, Julien ; Hyett, Matthew ; Parker, Gordon ; Breakspear, Michael. / Multimodal Depression Detection : Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors. In: IEEE Transactions on Affective Computing. 2018 ; Vol. 9, No. 4. pp. 478-490.
@article{da12bda4f30e4cf581cf6b1dbe98c447,
title = "Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors",
abstract = "An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-Term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-Tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.",
keywords = "depression-detection, multimodal-fusion, speaking-behavior, eye-activity, head-pose, Depression detection, multimodal fusion, speaking behaviour, eye activity, head pose",
author = "Sharifa Alghowinem and Roland GOECKE and Michael WAGNER and Julien Epps and Matthew Hyett and Gordon Parker and Michael Breakspear",
year = "2018",
month = "10",
day = "1",
doi = "10.1109/TAFFC.2016.2634527",
language = "English",
volume = "9",
pages = "478--490",
journal = "IEEE Transactions on Affective Computing",
issn = "1949-3045",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "4",

}

Multimodal Depression Detection : Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors. / Alghowinem, Sharifa; GOECKE, Roland; WAGNER, Michael; Epps, Julien; Hyett, Matthew; Parker, Gordon; Breakspear, Michael.

In: IEEE Transactions on Affective Computing, Vol. 9, No. 4, 7763752, 01.10.2018, p. 478-490.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Multimodal Depression Detection

T2 - Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors

AU - Alghowinem, Sharifa

AU - GOECKE, Roland

AU - WAGNER, Michael

AU - Epps, Julien

AU - Hyett, Matthew

AU - Parker, Gordon

AU - Breakspear, Michael

PY - 2018/10/1

Y1 - 2018/10/1

N2 - An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-Term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-Tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.

AB - An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-Term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-Tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.

KW - depression-detection

KW - multimodal-fusion

KW - speaking-behavior

KW - eye-activity

KW - head-pose

KW - Depression detection

KW - multimodal fusion

KW - speaking behaviour

KW - eye activity

KW - head pose

UR - http://purl.org/au-research/grants/arc/DP130101094

UR - http://www.scopus.com/inward/record.url?scp=85058011712&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/multimodal-depression-detection-fusion-analysis-paralinguistic-head-pose-eye-gaze-behaviors

U2 - 10.1109/TAFFC.2016.2634527

DO - 10.1109/TAFFC.2016.2634527

M3 - Article

VL - 9

SP - 478

EP - 490

JO - IEEE Transactions on Affective Computing

JF - IEEE Transactions on Affective Computing

SN - 1949-3045

IS - 4

M1 - 7763752

ER -