Author : Feng Zhang
Publisher : Stanford University
ISBN 13 :
Total Pages : 91 pages
Book Rating : 4.F/5 ( download)
Book Synopsis Cross-validation and Regression Analysis in High-dimensional Sparse Linear Models by : Feng Zhang
Download or read book Cross-validation and Regression Analysis in High-dimensional Sparse Linear Models written by Feng Zhang and published by Stanford University. This book was released on 2011 with total page 91 pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.