MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding

S. Madan, S. Ghosh, L. R. Sookha, M. A. Ganaie, R. Subramanian, A. Dhall, T. Gedeon

Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

1 Citation (Scopus)

Abstract

Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover, the causality aspects of MIP estimation are quite subjective and diverse. To this end, we aim to address the problem by annotating a large-scale 'in-the-wild' dataset for iden-tifying human perceptions about the 'Most Important Person (MIP)' in an image. The paper provides a thorough description of our proposed Multimodal Large Language Model (MLLM) based data annotation strategy, and a thor-ough data quality analysis. Further, we perform a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art MIP localization methods, indicating a significant drop in performance compared to existing datasets. The performance drop shows that the existing MIP localization algorithms must be more robust with respect to 'in-the-wild' situations. We believe the proposed dataset will play a vital role in building the next-generation social situation understanding methods. The dataset and associated code will be made available for research purposes.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
EditorsSharon X. Huang, Peyman Milanfar, Vishal M. Patel, Qiang Qiu, Srirangaraj Setlur
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1467-1476
Number of pages10
ISBN (Electronic)9798331510831
DOIs
Publication statusPublished - 2025
Event2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States
Duration: 28 Feb 20254 Mar 2025

Publication series

NameProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

Conference

Conference2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Country/TerritoryUnited States
CityTucson
Period28/02/254/03/25

Fingerprint

Dive into the research topics of 'MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding'. Together they form a unique fingerprint.

Cite this