Distributed Penalized Modal Regression for Massive Data

Jun Jin, Shuangzhe Liu, Tiefeng Ma

Research output: Contribution to journalArticlepeer-review

Abstract

Nowadays, researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory. Modal regression (MR) is a good alternative of the mean regression and likelihood based methods, because of its robustness and high efficiency. To this end, the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method (DC-MR). The major novelty of this method consists of splitting one entire dataset into several blocks, implementing the MR method on data in each block, and deriving final results through combining these regression results via a weighted average, which provides approximate estimates of regression results on the entire dataset. The proposed method significantly reduces the required amount of primary memory, and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set. The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property. In addition, the authors propose a practical modified modal expectation-maximization (MEM) algorithm for the proposed procedures. Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.

Original languageEnglish
Pages (from-to)1-24
Number of pages24
JournalJournal of Systems Science and Complexity
DOIs
Publication statusE-pub ahead of print - 14 Oct 2022

Cite this