GriT-DBSCAN: a spatial clustering algorithm for very large databases

Xiaogang Huang, Tiefeng Ma, Conan Liu, Shuangzhe Liu

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of DBSCAN is its O(n2) worst-case time complexity. To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilizing the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically demonstrate that GriT-DBSCAN has excellent reliability in terms of time complexity. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results show that our algorithms outperform existing algorithms.

Original languageEnglish
Article number109658
Pages (from-to)1-18
Number of pages18
JournalPattern Recognition
Volume142
DOIs
Publication statusE-pub ahead of print - 6 May 2023

Fingerprint

Dive into the research topics of 'GriT-DBSCAN: a spatial clustering algorithm for very large databases'. Together they form a unique fingerprint.

Cite this