TY - JOUR
T1 - Learned 3D Shape Representations Using Fused Geometrically Augmented Images
T2 - Application to Facial Expression and Action Unit Detection
AU - Taha, Bilal
AU - Hayat, Munawar
AU - Berretti, Stefano
AU - Hatzinakos, Dimitrios
AU - Werghi, Naoufel
N1 - Funding Information:
Manuscript received May 15, 2019; revised December 26, 2019 and February 19, 2020; accepted March 19, 2020. Date of publication March 30, 2020; date of current version September 3, 2020. This work was supported by a research fund from the Center for Cyber-Physical Systems (C2PS) at Khalifa University, Ref: RC1-C2PS-T4. This article was recommended by Associate Editor J. Han. (Corresponding author: Bilal Taha.) Bilal Taha is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada, and also with the Vector Institute for Artificial Intelligence, Toronto, ON M5G 1M1, Canada (e-mail: [email protected]).
Publisher Copyright:
© 1991-2012 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - In this paper, we propose an approach to learn generic multi-modal mesh surface representations using a novel scheme for fusing texture and geometric data. Our approach defines an inverse mapping between different geometric descriptors computed on the mesh surface or its down-sampled version, and the corresponding 2D texture image of the mesh, allowing the construction of fused geometrically augmented images (FGAI). This new fused modality enables us to learn feature representations from 3D data in a highly efficient manner by simply employing standard CNNs in a transfer-learning mode. The proposed approach is both computationally and memory efficient, preserves intrinsic geometric information and learns highly discriminative feature representations by effectively fusing shape and texture information at data level. The efficacy of our approach is demonstrated for the tasks of facial action unit detection and expression classification. The extensive experiments conducted on the Bosphorus and BU-4DFE datasets show that our method produces a significant boost in the performance when compared to state-of-the-art solutions.
AB - In this paper, we propose an approach to learn generic multi-modal mesh surface representations using a novel scheme for fusing texture and geometric data. Our approach defines an inverse mapping between different geometric descriptors computed on the mesh surface or its down-sampled version, and the corresponding 2D texture image of the mesh, allowing the construction of fused geometrically augmented images (FGAI). This new fused modality enables us to learn feature representations from 3D data in a highly efficient manner by simply employing standard CNNs in a transfer-learning mode. The proposed approach is both computationally and memory efficient, preserves intrinsic geometric information and learns highly discriminative feature representations by effectively fusing shape and texture information at data level. The efficacy of our approach is demonstrated for the tasks of facial action unit detection and expression classification. The extensive experiments conducted on the Bosphorus and BU-4DFE datasets show that our method produces a significant boost in the performance when compared to state-of-the-art solutions.
KW - convolution neural networks
KW - expression recognition
KW - facial action units
KW - fused geometrically augmented images
KW - Mesh surface
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85091236517&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2020.2984241
DO - 10.1109/TCSVT.2020.2984241
M3 - Article
AN - SCOPUS:85091236517
SN - 1051-8215
VL - 30
SP - 2900
EP - 2916
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 9
M1 - 9050730
ER -