TY - JOUR
T1 - Distributed Penalized Modal Regression for Massive Data
AU - Jin, Jun
AU - Liu, Shuangzhe
AU - Ma, Tiefeng
N1 - Funding Information:
This research was supported by the Fundamental Research Funds for the Central Universities under Grant No. JBK1806002 and the National Natural Science Foundation of China under Grant No. 11471264.
Publisher Copyright:
© 2022, The Editorial Office of JSSC & Springer-Verlag GmbH Germany.
PY - 2023/4
Y1 - 2023/4
N2 - Nowadays, researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory. Modal regression (MR) is a good alternative of the mean regression and likelihood based methods, because of its robustness and high efficiency. To this end, the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method (DC-MR). The major novelty of this method consists of splitting one entire dataset into several blocks, implementing the MR method on data in each block, and deriving final results through combining these regression results via a weighted average, which provides approximate estimates of regression results on the entire dataset. The proposed method significantly reduces the required amount of primary memory, and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set. The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property. In addition, the authors propose a practical modified modal expectation-maximization (MEM) algorithm for the proposed procedures. Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.
AB - Nowadays, researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory. Modal regression (MR) is a good alternative of the mean regression and likelihood based methods, because of its robustness and high efficiency. To this end, the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method (DC-MR). The major novelty of this method consists of splitting one entire dataset into several blocks, implementing the MR method on data in each block, and deriving final results through combining these regression results via a weighted average, which provides approximate estimates of regression results on the entire dataset. The proposed method significantly reduces the required amount of primary memory, and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set. The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property. In addition, the authors propose a practical modified modal expectation-maximization (MEM) algorithm for the proposed procedures. Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.
KW - Asymptotic distribution
KW - divide and conquer
KW - massive data
KW - modal regression
KW - multiple hypothesis testing
UR - http://www.scopus.com/inward/record.url?scp=85140070522&partnerID=8YFLogxK
U2 - 10.1007/s11424-022-1197-2
DO - 10.1007/s11424-022-1197-2
M3 - Article
AN - SCOPUS:85140070522
SN - 1009-6124
VL - 36
SP - 798
EP - 821
JO - Journal of Systems Science and Complexity
JF - Journal of Systems Science and Complexity
IS - 2
ER -