Multi-level action detection via learning latent structure

Behzad BOZORGTABAR, Roland GOECKE

Research output: A Conference proceeding or a Chapter in BookConference contribution

1 Downloads (Pure)

Abstract

Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context
Original languageEnglish
Title of host publicationProceedings of the IEEE International conference on image processing (ICIP 2015)
EditorsAbhay M. Chopde, Rambabu Vatti
Place of PublicationQuebec City, Canada
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages3004-3008
Number of pages5
Volume1
ISBN (Electronic)9781479983391
ISBN (Print)9781479983391
DOIs
Publication statusPublished - 27 Sep 2015
EventThe IEEE International Conference on Image Processing ICIP 2015 - Quebec, Quebec, Canada
Duration: 27 Sep 201530 Sep 2015

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume2015-December
ISSN (Print)1522-4880

Conference

ConferenceThe IEEE International Conference on Image Processing ICIP 2015
Abbreviated titleICIP 2015
CountryCanada
CityQuebec
Period27/09/1530/09/15

Fingerprint

Classifiers
Sports
Labels
Composite materials
Experiments

Cite this

BOZORGTABAR, B., & GOECKE, R. (2015). Multi-level action detection via learning latent structure. In A. M. Chopde, & R. Vatti (Eds.), Proceedings of the IEEE International conference on image processing (ICIP 2015) (Vol. 1, pp. 3004-3008). [7351354] (Proceedings - International Conference on Image Processing, ICIP; Vol. 2015-December). Quebec City, Canada: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICIP.2015.7351354
BOZORGTABAR, Behzad ; GOECKE, Roland. / Multi-level action detection via learning latent structure. Proceedings of the IEEE International conference on image processing (ICIP 2015). editor / Abhay M. Chopde ; Rambabu Vatti. Vol. 1 Quebec City, Canada : IEEE, Institute of Electrical and Electronics Engineers, 2015. pp. 3004-3008 (Proceedings - International Conference on Image Processing, ICIP).
@inproceedings{ddda75106835454abbd766736e1819b5,
title = "Multi-level action detection via learning latent structure",
abstract = "Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context",
keywords = "Action detection, latent structure, multi-level video representation, pooling regions",
author = "Behzad BOZORGTABAR and Roland GOECKE",
year = "2015",
month = "9",
day = "27",
doi = "10.1109/ICIP.2015.7351354",
language = "English",
isbn = "9781479983391",
volume = "1",
series = "Proceedings - International Conference on Image Processing, ICIP",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
pages = "3004--3008",
editor = "Chopde, {Abhay M.} and Rambabu Vatti",
booktitle = "Proceedings of the IEEE International conference on image processing (ICIP 2015)",
address = "United States",

}

BOZORGTABAR, B & GOECKE, R 2015, Multi-level action detection via learning latent structure. in AM Chopde & R Vatti (eds), Proceedings of the IEEE International conference on image processing (ICIP 2015). vol. 1, 7351354, Proceedings - International Conference on Image Processing, ICIP, vol. 2015-December, IEEE, Institute of Electrical and Electronics Engineers, Quebec City, Canada, pp. 3004-3008, The IEEE International Conference on Image Processing ICIP 2015, Quebec, Canada, 27/09/15. https://doi.org/10.1109/ICIP.2015.7351354

Multi-level action detection via learning latent structure. / BOZORGTABAR, Behzad; GOECKE, Roland.

Proceedings of the IEEE International conference on image processing (ICIP 2015). ed. / Abhay M. Chopde; Rambabu Vatti. Vol. 1 Quebec City, Canada : IEEE, Institute of Electrical and Electronics Engineers, 2015. p. 3004-3008 7351354 (Proceedings - International Conference on Image Processing, ICIP; Vol. 2015-December).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Multi-level action detection via learning latent structure

AU - BOZORGTABAR, Behzad

AU - GOECKE, Roland

PY - 2015/9/27

Y1 - 2015/9/27

N2 - Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context

AB - Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context

KW - Action detection

KW - latent structure

KW - multi-level video representation

KW - pooling regions

UR - http://www.scopus.com/inward/record.url?scp=84956698372&partnerID=8YFLogxK

U2 - 10.1109/ICIP.2015.7351354

DO - 10.1109/ICIP.2015.7351354

M3 - Conference contribution

SN - 9781479983391

VL - 1

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 3004

EP - 3008

BT - Proceedings of the IEEE International conference on image processing (ICIP 2015)

A2 - Chopde, Abhay M.

A2 - Vatti, Rambabu

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Quebec City, Canada

ER -

BOZORGTABAR B, GOECKE R. Multi-level action detection via learning latent structure. In Chopde AM, Vatti R, editors, Proceedings of the IEEE International conference on image processing (ICIP 2015). Vol. 1. Quebec City, Canada: IEEE, Institute of Electrical and Electronics Engineers. 2015. p. 3004-3008. 7351354. (Proceedings - International Conference on Image Processing, ICIP). https://doi.org/10.1109/ICIP.2015.7351354