Variable Selection for High-dimensional Data with Error Control

Download Variable Selection for High-dimensional Data with Error Control PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (139 download)

DOWNLOAD NOW!


Book Synopsis Variable Selection for High-dimensional Data with Error Control by : Han Fu (Ph. D. in biostatistics)

Download or read book Variable Selection for High-dimensional Data with Error Control written by Han Fu (Ph. D. in biostatistics) and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many high-throughput genomic applications involve a large set of covariates and it is crucial to discover which variables are truly associated with the response. It is often desirable for researchers to select variables that are indeed true and reproducible in followup studies. Effectively controlling the false discovery rate (FDR) increases the reproducibility of the discoveries and has been a major challenge in variable selection research, especially for high-dimensional data. Existing error control approaches include augmentation approaches which utilize artificial variables as benchmarks for decision making, such as model-X knockoffs. We introduce another augmentation-based selection framework extended from a Bayesian screening approach called reference distribution variable selection. Ordinal responses, which were not previously considered in this area, were used to compare different variable selection approaches. We constructed various importance measures that fit into the selection frameworks, using either L1 penalized regression or machine learning techniques, and compared these measures in terms of the FDR and power using simulated data. Moreover, we applied these selection methods to high-throughput methylation data for identifying features associated with the progression from normal liver tissue to hepatocellular carcinoma to further compare and contrast their performances. Having established the effectiveness of FDR control for model-X knockoffs, we turned our attention to another important data type - survival data with long-term survivors. Medical breakthroughs in recent years have led to cures for many diseases, resulting in increased observations of long-term survivors. The mixture cure model (MCM) is a type of survival model that is often used when a cured fraction exists. Unfortunately, currently few variable selection methods exist for MCMs when there are more predictors than samples. To fill the gap, we developed penalized MCMs for high-dimensional datasets which allow for identification of prognostic factors associated with both cure status and/or survival. Both parametric models and semi-parametric proportional hazards models were considered for modeling the survival component. For penalized parametric MCMs, we demonstrated how the estimation proceeded using two different iterative algorithms, the generalized monotone incremental forward stagewise (GMIFS) and Expectation-Maximization (E-M). For semi-parametric MCMs where multiple types of penalty functions were considered, the coordinate descent algorithm was combined with E-M for optimization. The model-X knockoffs method was combined with these algorithms to allow for FDR control in variable selection. Through extensive simulation studies, our penalized MCMs have been shown to outperform alternative methods on multiple metrics and achieve high statistical power with FDR being controlled. In two acute myeloid leukemia (AML) applications with gene expression data, our proposed approaches identified important genes associated with potential cure or time-to-relapse, which may help inform treatment decisions for AML patients.

Statistics for High-Dimensional Data

Download Statistics for High-Dimensional Data PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 364220192X
Total Pages : 568 pages
Book Rating : 4.6/5 (422 download)

DOWNLOAD NOW!


Book Synopsis Statistics for High-Dimensional Data by : Peter Bühlmann

Download or read book Statistics for High-Dimensional Data written by Peter Bühlmann and published by Springer Science & Business Media. This book was released on 2011-06-08 with total page 568 pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.

Statistical Analysis for High-Dimensional Data

Download Statistical Analysis for High-Dimensional Data PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3319270990
Total Pages : 313 pages
Book Rating : 4.3/5 (192 download)

DOWNLOAD NOW!


Book Synopsis Statistical Analysis for High-Dimensional Data by : Arnoldo Frigessi

Download or read book Statistical Analysis for High-Dimensional Data written by Arnoldo Frigessi and published by Springer. This book was released on 2016-02-16 with total page 313 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on future research directions, the contributions will benefit graduate students and researchers in computational biology, statistics and the machine learning community.

Variable Selection in High Dimensional Data Analysis with Applications

Download Variable Selection in High Dimensional Data Analysis with Applications PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 108 pages
Book Rating : 4.:/5 (92 download)

DOWNLOAD NOW!


Book Synopsis Variable Selection in High Dimensional Data Analysis with Applications by :

Download or read book Variable Selection in High Dimensional Data Analysis with Applications written by and published by . This book was released on 2015 with total page 108 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Partially Linear Models

Download Partially Linear Models PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 3642577008
Total Pages : 210 pages
Book Rating : 4.6/5 (425 download)

DOWNLOAD NOW!


Book Synopsis Partially Linear Models by : Wolfgang Härdle

Download or read book Partially Linear Models written by Wolfgang Härdle and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.

Applied Biclustering Methods for Big and High-Dimensional Data Using R

Download Applied Biclustering Methods for Big and High-Dimensional Data Using R PDF Online Free

Author :
Publisher : CRC Press
ISBN 13 : 1315356392
Total Pages : 433 pages
Book Rating : 4.3/5 (153 download)

DOWNLOAD NOW!


Book Synopsis Applied Biclustering Methods for Big and High-Dimensional Data Using R by : Adetayo Kasim

Download or read book Applied Biclustering Methods for Big and High-Dimensional Data Using R written by Adetayo Kasim and published by CRC Press. This book was released on 2016-08-18 with total page 433 pages. Available in PDF, EPUB and Kindle. Book excerpt: Proven Methods for Big Data Analysis As big data has become standard in many application areas, challenges have arisen related to methodology and software development, including how to discover meaningful patterns in the vast amounts of data. Addressing these problems, Applied Biclustering Methods for Big and High-Dimensional Data Using R shows how to apply biclustering methods to find local patterns in a big data matrix. The book presents an overview of data analysis using biclustering methods from a practical point of view. Real case studies in drug discovery, genetics, marketing research, biology, toxicity, and sports illustrate the use of several biclustering methods. References to technical details of the methods are provided for readers who wish to investigate the full theoretical background. All the methods are accompanied with R examples that show how to conduct the analyses. The examples, software, and other materials are available on a supplementary website.

Uncertainty Quantification Techniques in Statistics

Download Uncertainty Quantification Techniques in Statistics PDF Online Free

Author :
Publisher : MDPI
ISBN 13 : 3039285467
Total Pages : 128 pages
Book Rating : 4.0/5 (392 download)

DOWNLOAD NOW!


Book Synopsis Uncertainty Quantification Techniques in Statistics by : Jong-Min Kim

Download or read book Uncertainty Quantification Techniques in Statistics written by Jong-Min Kim and published by MDPI. This book was released on 2020-04-03 with total page 128 pages. Available in PDF, EPUB and Kindle. Book excerpt: Uncertainty quantification (UQ) is a mainstream research topic in applied mathematics and statistics. To identify UQ problems, diverse modern techniques for large and complex data analyses have been developed in applied mathematics, computer science, and statistics. This Special Issue of Mathematics (ISSN 2227-7390) includes diverse modern data analysis methods such as skew-reflected-Gompertz information quantifiers with application to sea surface temperature records, the performance of variable selection and classification via a rank-based classifier, two-stage classification with SIS using a new filter ranking method in high throughput data, an estimation of sensitive attribute applying geometric distribution under probability proportional to size sampling, combination of ensembles of regularized regression models with resampling-based lasso feature selection in high dimensional data, robust linear trend test for low-coverage next-generation sequence data controlling for covariates, and comparing groups of decision-making units in efficiency based on semiparametric regression.

Modern Statistical Methods for Health Research

Download Modern Statistical Methods for Health Research PDF Online Free

Author :
Publisher : Springer Nature
ISBN 13 : 3030724379
Total Pages : 506 pages
Book Rating : 4.0/5 (37 download)

DOWNLOAD NOW!


Book Synopsis Modern Statistical Methods for Health Research by : Yichuan Zhao

Download or read book Modern Statistical Methods for Health Research written by Yichuan Zhao and published by Springer Nature. This book was released on 2021-10-14 with total page 506 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book brings together the voices of leading experts in the frontiers of biostatistics, biomedicine, and the health sciences to discuss the statistical procedures, useful methods, and novel applications in biostatistics research. It also includes discussions of potential future directions of biomedicine and new statistical developments for health research, with the intent of stimulating research and fostering the interactions of scholars across health research related disciplines. Topics covered include: Health data analysis and applications to EHR data Clinical trials, FDR, and applications in health science Big network analytics and its applications in GWAS Survival analysis and functional data analysis Graphical modelling in genomic studies The book will be valuable to data scientists and statisticians who are working in biomedicine and health, other practitioners in the health sciences, and graduate students and researchers in biostatistics and health.

Variable Selection for High-dimensional Complex Data

Download Variable Selection for High-dimensional Complex Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (114 download)

DOWNLOAD NOW!


Book Synopsis Variable Selection for High-dimensional Complex Data by : Fei Xue

Download or read book Variable Selection for High-dimensional Complex Data written by Fei Xue and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Multiple Classifier Systems

Download Multiple Classifier Systems PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3319202480
Total Pages : 240 pages
Book Rating : 4.3/5 (192 download)

DOWNLOAD NOW!


Book Synopsis Multiple Classifier Systems by : Friedhelm Schwenker

Download or read book Multiple Classifier Systems written by Friedhelm Schwenker and published by Springer. This book was released on 2015-06-02 with total page 240 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 12th International Workshop on Multiple Classifier Systems, MCS 2015, held in Günzburg, Germany, in June/July 2015. The 19 revised papers presented were carefully reviewed and selected from 25 submissions. The papers address issues in multiple classifier systems and ensemble methods, including pattern recognition, machine learning, neural network, data mining and statistics. They are organized in topical sections on theory and algorithms and application and evaluation.

Journal of the American Statistical Association

Download Journal of the American Statistical Association PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 898 pages
Book Rating : 4.:/5 ( download)

DOWNLOAD NOW!


Book Synopsis Journal of the American Statistical Association by :

Download or read book Journal of the American Statistical Association written by and published by . This book was released on 2009 with total page 898 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Monte-Carlo Simulation-Based Statistical Modeling

Download Monte-Carlo Simulation-Based Statistical Modeling PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 9811033072
Total Pages : 440 pages
Book Rating : 4.8/5 (11 download)

DOWNLOAD NOW!


Book Synopsis Monte-Carlo Simulation-Based Statistical Modeling by : Ding-Geng (Din) Chen

Download or read book Monte-Carlo Simulation-Based Statistical Modeling written by Ding-Geng (Din) Chen and published by Springer. This book was released on 2017-02-01 with total page 440 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book brings together expert researchers engaged in Monte-Carlo simulation-based statistical modeling, offering them a forum to present and discuss recent issues in methodological development as well as public health applications. It is divided into three parts, with the first providing an overview of Monte-Carlo techniques, the second focusing on missing data Monte-Carlo methods, and the third addressing Bayesian and general statistical modeling using Monte-Carlo simulations. The data and computer programs used here will also be made publicly available, allowing readers to replicate the model development and data analysis presented in each chapter, and to readily apply them in their own research. Featuring highly topical content, the book has the potential to impact model development and data analyses across a wide spectrum of fields, and to spark further research in this direction.

Artificial Intelligence on Medical Data

Download Artificial Intelligence on Medical Data PDF Online Free

Author :
Publisher : Springer Nature
ISBN 13 : 9811901511
Total Pages : 474 pages
Book Rating : 4.8/5 (119 download)

DOWNLOAD NOW!


Book Synopsis Artificial Intelligence on Medical Data by : Mousumi Gupta

Download or read book Artificial Intelligence on Medical Data written by Mousumi Gupta and published by Springer Nature. This book was released on 2022-07-23 with total page 474 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book includes high-quality papers presented at the Second International Symposium on Computer Vision and Machine Intelligence in Medical Image Analysis (ISCMM 2021), organized by Computer Applications Department, SMIT in collaboration with Department of Pathology, SMIMS, Sikkim, India, and funded by Indian Council of Medical Research, during 11 – 12 November 2021. It discusses common research problems and challenges in medical image analysis, such as deep learning methods. It also discusses how these theories can be applied to a broad range of application areas, including lung and chest x-ray, breast CAD, microscopy and pathology. The studies included mainly focus on the detection of events from biomedical signals.

Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide

Download Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide PDF Online Free

Author :
Publisher : Government Printing Office
ISBN 13 : 1587634236
Total Pages : 236 pages
Book Rating : 4.5/5 (876 download)

DOWNLOAD NOW!


Book Synopsis Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide by : Agency for Health Care Research and Quality (U.S.)

Download or read book Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide written by Agency for Health Care Research and Quality (U.S.) and published by Government Printing Office. This book was released on 2013-02-21 with total page 236 pages. Available in PDF, EPUB and Kindle. Book excerpt: This User’s Guide is a resource for investigators and stakeholders who develop and review observational comparative effectiveness research protocols. It explains how to (1) identify key considerations and best practices for research design; (2) build a protocol based on these standards and best practices; and (3) judge the adequacy and completeness of a protocol. Eleven chapters cover all aspects of research design, including: developing study objectives, defining and refining study questions, addressing the heterogeneity of treatment effect, characterizing exposure, selecting a comparator, defining and measuring outcomes, and identifying optimal data sources. Checklists of guidance and key considerations for protocols are provided at the end of each chapter. The User’s Guide was created by researchers affiliated with AHRQ’s Effective Health Care Program, particularly those who participated in AHRQ’s DEcIDE (Developing Evidence to Inform Decisions About Effectiveness) program. Chapters were subject to multiple internal and external independent reviews. More more information, please consult the Agency website: www.effectivehealthcare.ahrq.gov)

Univariate and Bivariate Variable Selection in High Dimensional Data

Download Univariate and Bivariate Variable Selection in High Dimensional Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 470 pages
Book Rating : 4.:/5 (34 download)

DOWNLOAD NOW!


Book Synopsis Univariate and Bivariate Variable Selection in High Dimensional Data by : Vivian Wai Ying Ng

Download or read book Univariate and Bivariate Variable Selection in High Dimensional Data written by Vivian Wai Ying Ng and published by . This book was released on 2004 with total page 470 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data

Download Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.3/5 (797 download)

DOWNLOAD NOW!


Book Synopsis Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data by : Arkaprabha Ganguli

Download or read book Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data written by Arkaprabha Ganguli and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The field of statistical machine learning has seen a surge in popularity for feature selection methods for ultra-high dimensional datasets due to their huge applicability in various scientific domains ranging from genetics to astronomy. These applications typically involve a vast number of potential features, and a quantitative response or outcome variable. Also, often it is observed/hypothesized that only a small subset of these features are truly associated with the response. Any traditional feature selection algorithm is motivated by the need to uncover the true sparsity pattern, buried in the ultra-high dimensional data setting. However, these methods may lead to high false discoveries providing poor scientific insights into the underlying relationship. The error-controlled methods are designed to address this issue by controlling the expected proportion of falsely identified features among the selected ones. In this thesis, we develop and study two novel feature selection methods for ultrahigh dimensional data with False Discovery Rate (FDR) control with a real-world application in the context of diffusion magnetic resonance imaging (DMRI) tractography data.In the first chapter, we propose a p-value-free FDR controlling method for feature selection. Most of the state-of-the-art methods in the literature for controlling FDR rely on p-value, which depends on specific assumptions on the data distribution and may be questionable in some high-dimensional settings. To surpass this problem, we propose a 'screening \\& cleaning' strategy consisting of assigning importance scores to the predictors, followed by constructing an estimate of the FDR. We study the theoretical properties of the method and demonstrate its superior performance compared to existing methods in an extensive simulation study. Finally, we apply the method to a gene expression dataset and identify important genes associated with drug sensitivity.In the second chapter, We extend the feature selection method from a linear model to a non-linear and non-parametric setting by utilizing the Deep Learning (DL) framework. The DL has been at the center of analytics in recent years due to its impressive empirical success in analyzing complex data objects. Despite this success, most existing tools behave like black-box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning has emerged as a promising tool in this realm. However, the recent developments do not accommodate ultra-high dimensional and highly correlated features or high noise levels. In this article, we propose a novel screening and cleaning method with the aid of deep learning for a data-adaptive multi-resolutional discovery of highly correlated predictors with a controlled FDR. Extensive empirical evaluations over a wide range of simulated scenarios and several real datasets demonstrate the effectiveness of the proposed method in achieving high power while keeping the false discovery rate at a minimum.In the third and final chapter, we apply the proposed feature selection methods to the brain imaging tractography dataset. Our motivation comes from the evidence from studies of dementia which shows that some older adults continue to maintain their cognitive abilities despite signs of ongoing neuropathological diseases. Commonly referred to as cognitive reserve, this phenomenon has unclear neurobiological substrates and a current understanding of corresponding markers is lacking. This study aims at investigating the immense system of structural connections between brain regions constituting subcortical white matter (WM) as potential markers of cognitive reserve. Diffusion MRI tractography is an established computational neuroimaging method to model WM fiber organization throughout the brain. Standard statistical analyses capable of leveraging the high dimensionality of tractography data face additional methodological complications beyond those encountered in typical feature selection problems. Our proposed methodology is specifically tailored for addressing these concerns. Extensive simulation studies on synthetic datasets mimicking the real tractography dataset demonstrate a substantial gain in power with minimal false discoveries, compared with state-of-the-art methods for feature selection. Our application to predicting cognitive reserve in a clinical aging neuroimaging tractography dataset produces anatomically meaningful discoveries in brain regions associated with risk and resilience to neurodegeneration.Overall, this thesis presents novel and effective methods for feature selection in ultrahigh dimensional settings. Our proposed framework would benefit the researchers and professionals who encounter the difficulty of choosing pertinent variables from correlated and vast datasets in diverse fields, ranging from finance and social sciences to biology.

High-Dimensional Probability

Download High-Dimensional Probability PDF Online Free

Author :
Publisher : Cambridge University Press
ISBN 13 : 1108415199
Total Pages : 299 pages
Book Rating : 4.1/5 (84 download)

DOWNLOAD NOW!


Book Synopsis High-Dimensional Probability by : Roman Vershynin

Download or read book High-Dimensional Probability written by Roman Vershynin and published by Cambridge University Press. This book was released on 2018-09-27 with total page 299 pages. Available in PDF, EPUB and Kindle. Book excerpt: An integrated package of powerful probabilistic tools and key applications in modern mathematical data science.