Tri-SEM: A shape-aware robust regression method via chain-like segmentation and residual analysis

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Outliers pose a significant threat to the reliability of regression analysis. Unlike traditional robust methods that primarily rely on numerical optimization, this paper introduces Tri-SEM, a shape-aware robust regression framework that leverages the geometric and morphological structure of data through a flexible three-stage architecture: Split, Extraction, and Merge. In the Split stage, data are partitioned into chain-like segments using the Anderson-Darling test, projection analysis, and convex hull detection to isolate potential outliers, with clustering performed in a 2-D projected space for computational efficiency. In the Extraction stage, a subset of clean segments is selected by jointly considering their size and median squared residuals. In the Merge stage, reliable inliers are integrated using a histogram transition detector on 1-D residuals, capturing residual distribution patterns to construct the final regression estimate. Comprehensive experiments on diverse datasets demonstrate Tri-SEM's clear superiority in both prediction accuracy and estimation bias: it achieved the best overall rank and the highest prediction accuracy on 30 of the 35 datasets, while consistently outperforming the second-ranked method (MM-estimator) in estimation bias, achieving a relative improvement exceeding 90% on more than half (54.3%) of the datasets. Extensive ablation, sensitivity, convergence, and runtime analyses confirm the method's robustness, efficiency, and adaptability across a wide range of data scenarios.

    Original languageEnglish
    Article number113092
    Pages (from-to)1-13
    Number of pages13
    JournalPattern Recognition
    Volume175
    DOIs
    Publication statusPublished - 2026

    Fingerprint

    Dive into the research topics of 'Tri-SEM: A shape-aware robust regression method via chain-like segmentation and residual analysis'. Together they form a unique fingerprint.

    Cite this