TY - JOUR
T1 - A saddle point-guided clustering algorithm for data with complex structure
AU - Huang, Xiaogang
AU - Jin, Jun
AU - Zhuang, Dan
AU - Ma, Tiefeng
AU - van de Velden, Michel
AU - Liu, Shuangzhe
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/12/3
Y1 - 2025/12/3
N2 - Density-based clustering is a fundamental unsupervised learning technique with broad applications across diverse domains. However, most existing density-based clustering algorithms have difficulty in discovering clusters of different densities and sizes, especially in the case of significant variations in the density of points within the same cluster. In this paper, we propose a novel clustering algorithm that addresses this limitation by integrating a mode-seeking phase with a fragmented cluster fusion phase. In the mode-seeking phase, the dataset is partitioned into initial clusters, where each cluster center represents a point whose density is a local maximum, and the remaining points are assigned through an innovative majority voting mechanism. Natural clusters containing multiple local maxima of the underlying density function are thus partitioned into multiple initial clusters accordingly. To accurately capture the intrinsic cluster structure of the data, we introduce a saddle point-guided cluster similarity measure that quantifies the homogeneity between adjacent initial clusters; here, a saddle point is defined as the point of maximum density located on the boundary between the two clusters. With the cluster similarity, the obtained initial clusters are processed for merging using a fragmented cluster fusion technique based on minimum spanning forest (MSF). Extensive experiments on 31 benchmark datasets—including 20 synthetic and 11 real-world datasets—demonstrate the superior performance of the proposed algorithm. Specifically, it achieves the highest NMI scores on 23 datasets and the highest ARI scores on 21 datasets.
AB - Density-based clustering is a fundamental unsupervised learning technique with broad applications across diverse domains. However, most existing density-based clustering algorithms have difficulty in discovering clusters of different densities and sizes, especially in the case of significant variations in the density of points within the same cluster. In this paper, we propose a novel clustering algorithm that addresses this limitation by integrating a mode-seeking phase with a fragmented cluster fusion phase. In the mode-seeking phase, the dataset is partitioned into initial clusters, where each cluster center represents a point whose density is a local maximum, and the remaining points are assigned through an innovative majority voting mechanism. Natural clusters containing multiple local maxima of the underlying density function are thus partitioned into multiple initial clusters accordingly. To accurately capture the intrinsic cluster structure of the data, we introduce a saddle point-guided cluster similarity measure that quantifies the homogeneity between adjacent initial clusters; here, a saddle point is defined as the point of maximum density located on the boundary between the two clusters. With the cluster similarity, the obtained initial clusters are processed for merging using a fragmented cluster fusion technique based on minimum spanning forest (MSF). Extensive experiments on 31 benchmark datasets—including 20 synthetic and 11 real-world datasets—demonstrate the superior performance of the proposed algorithm. Specifically, it achieves the highest NMI scores on 23 datasets and the highest ARI scores on 21 datasets.
KW - Cluster similarity
KW - Density-based clustering
KW - Mode-seeking
KW - Saddle point
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=105021853789&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2025.114726
DO - 10.1016/j.knosys.2025.114726
M3 - Article
AN - SCOPUS:105021853789
SN - 0950-7051
VL - 331
SP - 1
EP - 16
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 114726
ER -