Statistical Methods for Alternative Splicing Using RNA Sequencing

Download Statistical Methods for Alternative Splicing Using RNA Sequencing PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (133 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Alternative Splicing Using RNA Sequencing by : Yu Hu

Download or read book Statistical Methods for Alternative Splicing Using RNA Sequencing written by Yu Hu and published by . This book was released on 2018 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The emergence of RNA-seq technology has made it possible to estimate isoform-specific gene expression and detect differential alternative splicing between conditions, thus providing us an effective way to discover disease susceptibility genes. Analysis of alternative splicing, however, is challenging because various biases present in RNA-seq data complicates the analysis, and if not appropriately corrected, will affect gene expression estimation and downstream modeling. Motivated by these issues, my dissertation focused on statistical problems related to the analysis of alternative splicing in RNA-seq data. In Part I of my dissertation, I developed PennSeq, a method that aims to account for non-uniform read distribution in isoform expression estimation. PennSeq models non-uniformity using the empirical read distribution in RNA-seq data. It is the first time that non-uniformity is modeled at the isoform level. Compared to existing approaches, PennSeq allows bias correction at a much finer scale and achieved higher estimation accuracy. In Part II of my dissertation, I developed PennDiff, a method that aims to detect differential alternative splicing by RNA-seq. This approach avoids multiple testing for exons originated from the same isoform(s) and is able to detect differential alternative splicing at both exon and gene level, with more flexibility and higher sensitivity than existing methods. In Part III of my dissertation, I focused on problems arising from single-cell RNA-seq (scRNA-seq), a newly developed technology that allows the measurement of cellular heterogeneity of gene expression in single cells. Compared to bulk tissue RNA-seq, analysis of scRNA-seq data is more challenging due to high technical variability across cells and extremely low sequencing depth. To overcome these challenges, I developed SCATS, a method that aims to detect differential alternative splicing with scRNA-seq data. SCATS employs an empirical Bayes approach to model technical noise by use of external RNA spike-ins and groups informative reads sharing the same isoform(s) to detect splicing change. SCATS showed superior performance in both simulation and real data analyses. In summary, methods developed in my dissertation provide biomedical researchers a set of powerful tools for transcriptomic data analysis and will aid novel scientific discovery.

Statistical Methods for Bulk and Single-cell RNA Sequencing Data

Download Statistical Methods for Bulk and Single-cell RNA Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 207 pages
Book Rating : 4.:/5 (11 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Bulk and Single-cell RNA Sequencing Data by : Wei Li

Download or read book Statistical Methods for Bulk and Single-cell RNA Sequencing Data written by Wei Li and published by . This book was released on 2019 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies on bulk tissues. Recently, the emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at a single-cell resolution, providing a chance to characterize stochastic heterogeneity within a cell population. The analysis of bulk and single-cell RNA-seq data at four different levels (samples, genes, transcripts, and exons) involves multiple statistical and computational questions, some of which remain challenging up to date. The first part of this dissertation focuses on the statistical challenges in the transcript-level analysis of bulk RNA-seq data. The next-generation RNA-seq technologies have been widely used to assess full-length RNA isoform structure and abundance in a high-throughput manner, enabling us to better understand the alternative splicing process and transcriptional regulation mechanism. However, accurate isoform identification and quantification from RNA-seq data are challenging due to the information loss in sequencing experiments. In Chapter 2, given the fast accumulation of multiple RNA-seq datasets from the same biological condition, we develop a statistical method, MSIQ, to achieve more accurate isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. The MSIQ method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples and allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy of MSIQ compared with alternative methods through both simulation and real data studies. In Chapter 3, we introduce a novel method, AIDE, the first approach that directly controls false isoform discoveries by implementing the statistical model selection principle. Solving the isoform discovery problem in a stepwise manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. Our results demonstrate that AIDE has the highest precision compared to the state-of-the-art methods, and it is able to identify isoforms with biological functions in pathological conditions. The second part of this dissertation discusses two statistical methods to improve scRNA-seq data analysis, which is complicated by the excess missing values, the so-called dropouts due to low amounts of mRNA sequenced within individual cells. In Chapter 5, we introduce scImpute, a statistical method to accurately and robustly impute the dropouts in scRNA-seq data. The scImpute method automatically identifies likely dropouts, and only performs imputation on these values by borrowing information across similar cells. Evaluation based on both simulated and real scRNA-seq data suggests that scImpute is an effective tool to recover transcriptome dynamics masked by dropouts, enhance the clustering of cell subpopulations, and improve the accuracy of differential expression analysis. In Chapter 6, we propose a flexible and robust simulator, scDesign, to optimize the choices of sequencing depth and cell number in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. It is the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings.

Statistical Methods for Deep Sequencing Data

Download Statistical Methods for Deep Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 88 pages
Book Rating : 4.:/5 (871 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Deep Sequencing Data by : Shihao Shen

Download or read book Statistical Methods for Deep Sequencing Data written by Shihao Shen and published by . This book was released on 2012 with total page 88 pages. Available in PDF, EPUB and Kindle. Book excerpt: Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (Multivariate Analysis of Transcript Splicing), a Bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT-PCR validation rate of 86% for differential alternative splicing events with a MATS FDR of

Statistical Methods for Analyzing MRNA Isoform Variation in Large-scale RNA-seq Data

Download Statistical Methods for Analyzing MRNA Isoform Variation in Large-scale RNA-seq Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 113 pages
Book Rating : 4.:/5 (14 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Analyzing MRNA Isoform Variation in Large-scale RNA-seq Data by : Levon Demirdjian

Download or read book Statistical Methods for Analyzing MRNA Isoform Variation in Large-scale RNA-seq Data written by Levon Demirdjian and published by . This book was released on 2018 with total page 113 pages. Available in PDF, EPUB and Kindle. Book excerpt: Alternative splicing (AS) is a major source of cellular and functional complexity in the eukaryotic transcriptome and plays a critical role in many developmental processes and diseases. Variations in AS are an important factor in disease-causing mutations, and it is hypothesized that over half of all known disease-causing mutations affect splicing patterns. Next-generation RNA sequencing (RNA-seq) technology has enabled the accumulation of large-scale sequencing data from diverse human tissues and populations and has provided an important resource for discovering variations in AS, yet the size and complexity of large-scale RNA-seq datasets continue to pose significant data analysis challenges to researchers. In this work, we propose new statistical methodologies that more effectively leverage complex RNA-seq data structures for studying AS. In the first part of this work, we propose a sensitive and robust methodology called PAIRADISE for detecting genetic and allelic variation of alternative splicing in population-scale transcriptome datasets. PAIRADISE uses a novel statistical framework to detect allele-specific alternative splicing (ASAS) from population-scale RNA-seq data. A key feature of PAIRADISE is a statistical model that aggregates ASAS signals across multiple replicates of a given individual or multiple individuals in a population. PAIRADISE consistently outperforms alternative statistical models in simulation studies, and boosts the power of ASAS detection when applied to replicate or population-scale RNA-seq data. Next, we introduce the rMATS-Iso statistical framework for quantifying AS in modules with complex patterns of AS using replicate RNA-seq data. Importantly, rMATS-Iso leverages an EM algorithm to disambiguate short RNA-seq reads which may be consistent with multiple mRNA isoforms. As a result, rMATS-Iso can accommodate complex patterns of AS within a splicing module where transcripts can be defined by any combination of exons, splice site choices, etc. In addition, rMATS-Iso uses a likelihood ratio test to detect differential splicing between sample groups, and quantifies the extent to which each individual isoform contributes to the overall difference. In conjunction with the continued development of next-generation sequencing methods, we anticipate that both PAIRADISE and rMATS-Iso will have broad utilities in elucidating the landscape of alternative splicing variation as well as other forms of mRNA isoform variation in human populations.

Statistical Analysis of RNA-Seq Alternative Splicing Data and Gas Chromatography-Mass Spectrometry Data

Download Statistical Analysis of RNA-Seq Alternative Splicing Data and Gas Chromatography-Mass Spectrometry Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 144 pages
Book Rating : 4.:/5 (17 download)

DOWNLOAD NOW!


Book Synopsis Statistical Analysis of RNA-Seq Alternative Splicing Data and Gas Chromatography-Mass Spectrometry Data by : Yi Yi

Download or read book Statistical Analysis of RNA-Seq Alternative Splicing Data and Gas Chromatography-Mass Spectrometry Data written by Yi Yi and published by . This book was released on 2016 with total page 144 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the blossom of bio-chemical technologies in recent years, large and diverse data from every branch of biology has been generated. These data contain insightful truth of science and always present challenges to modeling, computation and interpretation. In this work, I present statistical models for two types of bioinformatic data: RNA-Seq alternative splicing and GCMS metabolomics. R packages grMATS and gcmsDecon are available for download. The next-generation sequencing produces rich RNA-Sequencing data, where we observe alternative splicing events. Replicate multivariate analysis of transcript splicing (rMATS) has shown advantages over other existing methods for detection of differential alternative splicing from replicate RNA-Seq data. However, the current framework of rMATS only deals with two-isoform splicing events, which limits its usage. In this paper, we present a generalized rMATS framework to deal with multiple isoform splicing events and the model could also be extended to compare differential splicing between multiple groups. We provide a generalized likelihood ratio test where the null hypothesis allows user-defined threshold of splicing change for isoforms. We show that our test statistic follow a mixture of chi-square distributions where the coefficients depend on values of the true parameters and a least favorable test statistic is computed when true parameters are unknown. We show efficacy of our model in both 27+3 simulations and a real dataset. Due to the huge demand for methods on multiple isoform RNA-Seq data, our model will be useful in RNA-Seq research projects. As a collection of metabolic end-products, metabolome reflects the overall activity of the metabolic network and has been playing an important role in modern bio-chemical researches. Monitoring metabolites and relating their changes to the influence of other factors is a major scientific interest. The technology of Gas Chromatograpy - Mass Spectrometry (GCMS) produces from biological samples a metabolomic data type where each metabolite is broken into different masses (their relative proportions form a mass spectrum s) and co-elute within a retention time range where the spectrum is unchanged. This unique signature data structure enables individual metabolite identification and allows library construction for the whole metabolome. However, GCMS is unable to clearly separate different metabolite elutions, which poses a challenging problem of deconvolution and library matching. In addition, studies of metabolome usually involve multiple biological samples in order to understand which metabolites are related to diseases. Building the multiple correspondence across all samples further complicates the task. We propose an automatic rank-based non-negative matrix factorization model to streamline the spectral deconvolution, multiple corrspondence, metabolite selection and library matching. We apply the program on 27 simulation datasets as well as 2 real contrived datasets. All results show superior strength of our model over existing software.

RNA-seq Data Analysis

Download RNA-seq Data Analysis PDF Online Free

Author :
Publisher : CRC Press
ISBN 13 : 1466595019
Total Pages : 322 pages
Book Rating : 4.4/5 (665 download)

DOWNLOAD NOW!


Book Synopsis RNA-seq Data Analysis by : Eija Korpelainen

Download or read book RNA-seq Data Analysis written by Eija Korpelainen and published by CRC Press. This book was released on 2014-09-19 with total page 322 pages. Available in PDF, EPUB and Kindle. Book excerpt: The State of the Art in Transcriptome AnalysisRNA sequencing (RNA-seq) data offers unprecedented information about the transcriptome, but harnessing this information with bioinformatics tools is typically a bottleneck. RNA-seq Data Analysis: A Practical Approach enables researchers to examine differential expression at gene, exon, and transcript le

Statistical Analysis of Next Generation Sequencing Data

Download Statistical Analysis of Next Generation Sequencing Data PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3319072129
Total Pages : 438 pages
Book Rating : 4.3/5 (19 download)

DOWNLOAD NOW!


Book Synopsis Statistical Analysis of Next Generation Sequencing Data by : Somnath Datta

Download or read book Statistical Analysis of Next Generation Sequencing Data written by Somnath Datta and published by Springer. This book was released on 2014-07-03 with total page 438 pages. Available in PDF, EPUB and Kindle. Book excerpt: Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized medicine. About the editors: Somnath Datta is Professor and Vice Chair of Bioinformatics and Biostatistics at the University of Louisville. He is Fellow of the American Statistical Association, Fellow of the Institute of Mathematical Statistics and Elected Member of the International Statistical Institute. He has contributed to numerous research areas in Statistics, Biostatistics and Bioinformatics. Dan Nettleton is Professor and Laurence H. Baker Endowed Chair of Biological Statistics in the Department of Statistics at Iowa State University. He is Fellow of the American Statistical Association and has published research on a variety of topics in statistics, biology and bioinformatics.

Alternative Splicing Analysis Using RNA-seq Data

Download Alternative Splicing Analysis Using RNA-seq Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (665 download)

DOWNLOAD NOW!


Book Synopsis Alternative Splicing Analysis Using RNA-seq Data by : David James Hiller

Download or read book Alternative Splicing Analysis Using RNA-seq Data written by David James Hiller and published by . This book was released on 2010 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: A single gene may produce multiple variants of a protein called isoforms. This process is mediated by alternative splicing of the mRNA transcript. We discuss some of the challenges inherent in the tasks of isoform quantification and novel isoform discovery on a given sample, and steps we have taken to address these challenges. In particular, in the first portion of this thesis we discuss the isoform quantification problem and give a technical condition describing under what conditions it is identifiable. In the second half of this thesis, we present a mathematical formulation of a novel approach to solving the isoform discovery problem. We show that our approach performs well on simulated data under both uniform read sampling and significantly biased sampling, and that it also returns a reasonable result on several test genes taken from an actual RNA-seq data set.

Statistical Methods for RNA-sequencing Data

Download Statistical Methods for RNA-sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 84 pages
Book Rating : 4.:/5 (113 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for RNA-sequencing Data by : Rhonda Bacher

Download or read book Statistical Methods for RNA-sequencing Data written by Rhonda Bacher and published by . This book was released on 2017 with total page 84 pages. Available in PDF, EPUB and Kindle. Book excerpt: Major methodological and technological advances in sequencing have inspired ambitious biological questions that were previously elusive. Addressing such questions with novel and complex data requires statistically rigorous tools. In this dissertation, I develop, evaluate, and apply statistical and computational methods for analysis of high-throughput sequencing data. A unifying theme of this work is that all these methods are aimed at RNA-seq data. The first method focuses on characterizing gene expression in RNA-seq experiments with ordered conditions. The second focuses on single-cell RNA-seq data, where we develop a method for normalization to account for a previously unknown technical artifact in the data. Finally, we develop a simulation in order to recapitulate the source of the artifact [in silico].

Statistical Methods for Whole Transcriptome Sequencing

Download Statistical Methods for Whole Transcriptome Sequencing PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (133 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Whole Transcriptome Sequencing by : Cheng Jia

Download or read book Statistical Methods for Whole Transcriptome Sequencing written by Cheng Jia and published by . This book was released on 2017 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: RNA-Sequencing (RNA-Seq) has enabled detailed unbiased profiling of whole transcriptomes with incredible throughput. Recent technological breakthroughs have pushed back the frontiers of RNA expression measurement to single-cell level (scRNA-Seq). With both bulk and single-cell RNA-Seq analyses, modeling of the noise structure embedded in the data is crucial for drawing correct inference. In this dissertation, I developed a series of statistical methods to account for the technical variations specific in RNA-Seq experiments in the context of isoform- or gene- level differential expression analyses. In the first part of my dissertation, I developed MetaDiff (https://github.com/jiach/MetaDiff ), a random-effects meta-regression model, that allows the incorporation of uncertainty in isoform expression estimation in isoform differential expression analysis. This framework was further extended to detect splicing quantitative trait loci with RNA-Seq data. In the second part of my dissertation, I developed TASC (Toolkit for Analysis of Single-Cell data; https://github.com/scrna-seq/TASC), a hierarchical mixture model, to explicitly adjust for cell-to-cell technical differences in scRNA-Seq analysis using an empirical Bayes approach. This framework can be adapted to perform differential gene expression analysis. In the third part of my dissertation, I developed, TASC-B, a method extended from TASC to model transcriptional bursting- induced zero-inflation. This model can identify and test for the difference in the level of transcriptional bursting. Compared to existing methods, these new tools that I developed have been shown to better control the false discovery rate in situations where technical noise cannot be ignored. They also display superior power in both our simulation studies and real world applications.

Alternative Splicing

Download Alternative Splicing PDF Online Free

Author :
Publisher : Springer Nature
ISBN 13 : 1071625217
Total Pages : 356 pages
Book Rating : 4.0/5 (716 download)

DOWNLOAD NOW!


Book Synopsis Alternative Splicing by : Peter Scheiffele

Download or read book Alternative Splicing written by Peter Scheiffele and published by Springer Nature. This book was released on 2022-07-27 with total page 356 pages. Available in PDF, EPUB and Kindle. Book excerpt: This detailed volume collects commonly used and cutting-edge methods to analyze alternative splicing, a key step in gene regulation. After an introduction of the alternative splicing mechanism and its targeting for therapeutic strategies, the book continues with techniques for analyzing alternative splicing profiles in complex biological systems, visualizing and localizing alternative spliced transcripts with cellular and sub-cellular resolution, probing regulators of alternative splicing, as well as assessing the functional consequences of alternative splicing. Written for the highly successful Methods in Molecular Biology series, chapters include introduction to their respective topics, lists of the necessary materials and reagents, step-by-step, reproducible protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and practical, Alternative Splicing: Methods and Protocols serves as an ideal guide for both RNA aficionados that want to implement novel approaches in their labs and novices undertaking alternative splicing projects.

Deep Sequencing Data Analysis

Download Deep Sequencing Data Analysis PDF Online Free

Author :
Publisher : Humana Press
ISBN 13 : 9781627035132
Total Pages : 0 pages
Book Rating : 4.0/5 (351 download)

DOWNLOAD NOW!


Book Synopsis Deep Sequencing Data Analysis by : Noam Shomron

Download or read book Deep Sequencing Data Analysis written by Noam Shomron and published by Humana Press. This book was released on 2013-07-20 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The new genetic revolution is fuelled by Deep Sequencing (or Next Generation Sequencing) apparatuses which, in essence, read billions of nucleotides per reaction. Effectively, when carefully planned, any experimental question which can be translated into reading nucleic acids can be applied.In Deep Sequencing Data Analysis, expert researchers in the field detail methods which are now commonly used to study the multi-facet deep sequencing data field. These included techniques for compressing of data generated, Chromatin Immunoprecipitation (ChIP-seq), and various approaches for the identification of sequence variants. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of necessary materials and reagents, step-by-step, readily reproducible protocols, and key tips on troubleshooting and avoiding known pitfalls. Authoritative and practical, Deep Sequencing Data Analysis seeks to aid scientists in the further understanding of key data analysis procedures for deep sequencing data interpretation.

Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments

Download Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (136 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments by : Zijian Ni (Ph.D.)

Download or read book Statistical Methods for Improving Data Quality in Modern Rna Sequencing Experiments written by Zijian Ni (Ph.D.) and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: RNA sequencing (RNA-seq) has revolutionized the possibility of measuring transcriptome-wide gene expression in the last two decades. Modern RNA sequencing techniques such as single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have been developed in recent years, allowing researchers to quantify gene expression in single-cell resolution or to profile gene activity patterns in 2-dimensional space across tissue. While useful, data collected from these techniques always come with noise, and appropriate filtering and cleaning are required for reliable downstream analyses. In this dissertation, I investigate multiple quality-related issues in scRNA-seq and ST experiments, and I develop, implement, evaluate and apply statistical methods to adjust for them. A unifying theme of this work is that all these methods aim at improving data quality and allowing for better power and precision in downstream analyses. For scRNA-seq data, the quality issue we discuss in this dissertation is distinguishing barcodes associated with real cells from those binding background noise. In droplet-based scRNA-seq experiments, raw data contains both cell barcodes that should be retained for downstream analysis as well as background barcodes that are uninformative and should be filtered out. Due to ambient RNAs presenting in all the barcodes, cell barcodes are not easily distinghished from background barcodes. Both misclassified background barcodes and cell barcodes induce misleading results in downstream analyses. Existing filtering methods test barcodes individually and consequently do not leverage the strong cell-to-cell correlation present in most datasets. To improve cell detection, we introduce CB2, a cluster-based approach for distinguishing real cells from background barcodes. As demonstrated in simulated and case study datasets, CB2 has increased power for identifying real cells which allows for the identification of novel subpopulations and improves downstream differential expression analyses. We then present a benchmark study to evaluate the performance of cell detection methods, including CB2, on public scRNA-seq datasets covering a variety of experiment protocols. In recent years, variants of scRNA-seq techniques have been developed for specialized biological tasks. While the data structures remain the same as the standard scRNA-seq experiment, the underlying data properties can alter a lot. Here, we propose the first benchmark study to provide a thorough comparison across existing cell detection methods in scRNA-seq data, and to guide users to choose the appropriate methods for their experiments. Evaluation metrics include power, precision, computational efficiency, robustness, and accessibility. In addition, we provide investigation and guidance on appropriately choosing filtering parameters in order to improve data quality. For ST data, we uncover, for the first time, a novel quality issue that genes expressed at one tissue region bleed out and contaminate nearby tissue regions. ST is a powerful and widely-used approach for profiling transcriptome-wide gene expression across a tissue with emerging applications in molecular medicine and tumor diagnostics. Recent ST experiments utilize slides containing thousands of spots with spot-specific barcodes that bind RNAs. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case owing to bleed from nearby spots, an artifact we refer to as spot swapping. We design a creative human-mouse chimeric ST experiment to validate the existence of spot swapping. Spot swapping hinders inferences of region-specific gene activities and tissue annotations. In order to decontaminate ST data, we propose SpotClean, a probabilistic model that measures the spot swapping effect and estimates gene expression using EM algorithm. SpotClean is shown to provide a more accurate estimation of the underlying gene expression, increase the specificity of marker gene signals, and, more importantly, allow for improved tumor diagnostics.

Applications of RNA-Seq and Omics Strategies

Download Applications of RNA-Seq and Omics Strategies PDF Online Free

Author :
Publisher : BoD – Books on Demand
ISBN 13 : 9535135031
Total Pages : 330 pages
Book Rating : 4.5/5 (351 download)

DOWNLOAD NOW!


Book Synopsis Applications of RNA-Seq and Omics Strategies by : Fabio Marchi

Download or read book Applications of RNA-Seq and Omics Strategies written by Fabio Marchi and published by BoD – Books on Demand. This book was released on 2017-09-13 with total page 330 pages. Available in PDF, EPUB and Kindle. Book excerpt: The large potential of RNA sequencing and other "omics" techniques has contributed to the production of a huge amount of data pursuing to answer many different questions that surround the science's great unknowns. This book presents an overview about powerful and cost-efficient methods for a comprehensive analysis of RNA-Seq data, introducing and revising advanced concepts in data analysis using the most current algorithms. A holistic view about the entire context where transcriptome is inserted is also discussed here encompassing biological areas with remarkable technological advances in the study of systems biology, from microorganisms to precision medicine.

Statistical Methods for the Analysis of RNA Sequencing Data

Download Statistical Methods for the Analysis of RNA Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 340 pages
Book Rating : 4.:/5 (16 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for the Analysis of RNA Sequencing Data by : Man-Kee Maggie Chu

Download or read book Statistical Methods for the Analysis of RNA Sequencing Data written by Man-Kee Maggie Chu and published by . This book was released on 2014 with total page 340 pages. Available in PDF, EPUB and Kindle. Book excerpt: The next generation sequencing technology, RNA-sequencing (RNA-seq), has an increasing popularity over traditional microarrays in transcriptome analyses. Statistical methods used for gene expression analyses with these two technologies are di erent because the array-based technology measures intensities using continuous distributions, whereas RNA-seq provides absolute quantification of gene expression using counts of reads. There is a need for reliable statistical methods to exploit the information from the rapidly evolving sequencing technologies and limited work has been done on expression analysis of time-course RNA-seq data. Functional clustering is an important method for examining gene expression patterns and thus discovering co-expressed genes to better understand the biological systems. Clusteringbased approaches to analyze repeated digital gene expression measures are in demand. In this dissertation, we propose a model-based clustering method for identifying gene expression patterns in time-course RNA-seq data. Our approach employs a longitudinal negative binomial mixture model to postulate the over-dispersed time-course gene count data. The e ectiveness of the proposed clustering method is assessed using simulated data and is illustrated by real data from time-course genomic experiments. Due to the complexity and size of genomic data, the choice of good starting values is an important issue to the proposed clustering algorithm. There is a need for a reliable initialization strategy for cluster-wise regression specifically for time-course discrete count data. We modify existing common initialization procedures to suit our model-based clustering algorithm and the procedures are evaluated through a simulation study on artificial datasets and are applied to real genomic examples to identify the optimal initialization method. Another common issue in gene expression analysis is the presence of missing values in the datasets. Various treatments to missing values in genomic datasets have been developed but limited work has been done on RNA-seq data. In the current work, we examine the performance of various imputation methods and their impact on the clustering of time-course RNA-seq data. We develop a cluster-based imputation method which is specifically suitable for dealing with missing values in RNA-seq datasets. Simulation studies are provided to assess the performance of the proposed imputation approach.

Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing

Download Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3319313509
Total Pages : 404 pages
Book Rating : 4.3/5 (193 download)

DOWNLOAD NOW!


Book Synopsis Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing by : Ana M. Aransay

Download or read book Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing written by Ana M. Aransay and published by Springer. This book was released on 2016-06-02 with total page 404 pages. Available in PDF, EPUB and Kindle. Book excerpt: High throughput sequencing (HTS) technologies have conquered the genomics and epigenomics worlds. The applications of HTS methods are wide, and can be used to sequence everything from whole or partial genomes, transcriptomes, non-coding RNAs, ribosome profiling, to single-cell sequencing. Having such diversity of alternatives, there is a demand for information by research scientists without experience in HTS that need to choose the most suitable methodology or combination of platforms and to define their experimental designs to achieve their specific objectives. Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing aims to collect in a single volume all aspects that should be taken into account when HTS technologies are being incorporated into a research project and the reasons behind them. Moreover, examples of several successful strategies will be analyzed to make the point of the crucial features. This book will be of use to all scientist that are unfamiliar with HTS and want to incorporate such technologies to their research.

The Statistics of RNA Splicing in Single Cells

Download The Statistics of RNA Splicing in Single Cells PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (134 download)

DOWNLOAD NOW!


Book Synopsis The Statistics of RNA Splicing in Single Cells by : Julia Eve Olivieri

Download or read book The Statistics of RNA Splicing in Single Cells written by Julia Eve Olivieri and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Although the amount of single-cell RNA-sequencing (scRNA-seq) data has exponentially increased in recent years, analysis of RNA splicing in these datasets remains virtually nonexistent, largely due to the sparsity and bias of the data. In this thesis, we introduce a new method of analyzing differential splicing at the single-cell level and apply this method to make new biological discoveries. We start by introducing the SpliZ, which quantifies the alternative splicing of each gene in a single number for each cell in the dataset. We verify the validity of the SpliZ through simulation, comparison with existing methods, and re-discovery of known true positives in the human lung. Next, we apply the SpliZ to over 200,000 cells from human, mouse, and mouse lemur to create a comprehensive atlas of cell-type-resolved alternative splicing, with experimental validation of two examples. We discover that unsupervised clustering of cells based only on the SpliZ scores of RPS24 and ATP5F1C accurately recapitulates division into stromal, immune, and epithelial compartments. Correlation of the SpliZ with developmental time reveals previously unknown conserved splicing changes throughout spermatogenesis. Finally, we apply the SpliZ to spatial transcriptomics data to discover spatially-resolved RNA splicing patterns in the mouse brain, which are more significantly localized than gene expression for Myl6 and Gng13. The SpliZ opens the door to widespread analysis of alternative splicing in scRNA-seq data.