Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics

Download Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 246 pages
Book Rating : 4.:/5 (16 download)

DOWNLOAD NOW!


Book Synopsis Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics by : Fanny Perraudeau

Download or read book Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics written by Fanny Perraudeau and published by . This book was released on 2018 with total page 246 pages. Available in PDF, EPUB and Kindle. Book excerpt: I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a general and flexible zero-inflated negative binomial-based wanted variation extraction (ZINB-WaVE) method, which extracts low-dimensional signal from scRNA-seq read counts, accounting for zero inflation (dropouts), over-dispersion, and the discrete nature of the data. Additionally, I introduce an application of the ZINB-WaVE method that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq differential expression pipelines for zero-inflated data, boosting performance for scRNA-seq analysis. Finally, I present a method to estimate bacterial abundances in human metagenomes using full-length 16S sequencing reads.

Statistical Simulation and Analysis of Single-cell RNA-seq Data

Download Statistical Simulation and Analysis of Single-cell RNA-seq Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (141 download)

DOWNLOAD NOW!


Book Synopsis Statistical Simulation and Analysis of Single-cell RNA-seq Data by : Tianyi Sun

Download or read book Statistical Simulation and Analysis of Single-cell RNA-seq Data written by Tianyi Sun and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The recent development of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies by revealing the genome-wide gene expression levels within individual cells. In contrast to bulk RNA sequencing, scRNA-seq technology captures cell-specific transcriptome landscapes, which can reveal crucial information about cell-to-cell heterogeneity across different tissues, organs, and systems and enable the discovery of novel cell types and new transient cell states. According to search results from PubMed, from 2009-2023, over 5,000 published studies have generated datasets using this technology. Such large volumes of data call for high-quality statistical methods for their analysis. In the three projects of this dissertation, I have explored and developed statistical methods to model the marginal and joint gene expression distributions and determine the latent structure type for scRNA-seq data. In all three projects, synthetic data simulation plays a crucial role. My first project focuses on the exploration of the Beta-Poisson hierarchical model for the marginal gene expression distribution of scRNA-seq data. This model is a simplified mechanistic model with biological interpretations. Through data simulation, I demonstrate three typical behaviors of this model under different parameter combinations, one of which can be interpreted as one source of the sparsity and zero inflation that is often observed in scRNA-seq datasets. Further, I discuss parameter estimation methods of this model and its other applications in the analysis of scRNA-seq data. My second project focuses on the development of a statistical simulator, scDesign2, to generate realistic synthetic scRNA-seq data. Although dozens of simulators have been developed before, they lack the capacity to simultaneously achieve the following three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, scDesign2 is developed as a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple scRNA-seq protocols and other single-cell gene expression count-based technologies. Compared with existing simulators, scDesign2 is advantageous in its transparent use of probabilistic models and is unique in its ability to capture gene correlations via copula. We verify that scDesign2 generates more realistic synthetic data for four scRNA-seq protocols (10x Genomics, CEL-Seq2, Fluidigm C1, and Smart-Seq2) and two single-cell spatial transcriptomics protocols (MERFISH and pciSeq) than existing simulators do. Under two typical computational tasks, cell clustering and rare cell type detection, we demonstrate that scDesign2 provides informative guidance on deciding the optimal sequencing depth and cell number in single-cell RNA-seq experimental design, and that scDesign2 can effectively benchmark computational methods under varying sequencing depths and cell numbers. With these advantages, scDesign2 is a powerful tool for single-cell researchers to design experiments, develop computational methods, and choose appropriate methods for specific data analysis needs. My third project focuses on deciding latent structure types for scRNA-seq datasets. Clustering and trajectory inference are two important data analysis tasks that can be performed for scRNA-seq datasets and will lead to different interpretations. However, as of now, there is no principled way to tell which one of these two types of analysis results is more suitable to describe a given dataset. In this project, we propose two computational approaches that aim to distinguish cluster-type vs. trajectory-type scRNA-seq datasets. The first approach is based on building a classifier using eigenvalue features of the gene expression covariance matrix, drawing inspiration from random matrix theory (RMT). The second approach is based on comparing the similarity of real data and simulated data generated by assuming the cell latent structure as clusters or a trajectory. While both approaches have limitations, we show that the second approach gives more promising results and has room for further improvements.

Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data

Download Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 39 pages
Book Rating : 4.:/5 (119 download)

DOWNLOAD NOW!


Book Synopsis Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data by : Dylan Maxwell Cable

Download or read book Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data written by Dylan Maxwell Cable and published by . This book was released on 2020 with total page 39 pages. Available in PDF, EPUB and Kindle. Book excerpt: Spatial transcriptomic technologies measure gene expression at increasing spatial resolution, approaching individual cells. One limitation of current technologies is that spatial measurements may contain contributions from multiple cells, hindering the discovery of cell type-specific spatial patterns of localization and expression. In this thesis, I will explore the development of Robust Cell Type Decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA sequencing data to decompose mixtures, such as those observed in spatial transcriptomic technologies. Our RCTD approach accounts for platform effects introduced by systematic technical variability inherent to different sequencing modalities. We demonstrate RCTD provides substantial improvement in cell type assignment in Slide-seq data by accurately reproducing known cell type and subtype localization patterns in the cerebellum and hippocampus. We further show the advantages of RCTD by its ability to detect mixtures and identify cell types on an assessment dataset. Finally, we show how RCTD’s recovery of cell type localization uniquely enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD has the potential to enable the definition of spatial components of cellular identity, uncovering new principles of cellular organization in biological tissue.

Computational Methods for Single-Cell Data Analysis

Download Computational Methods for Single-Cell Data Analysis PDF Online Free

Author :
Publisher : Humana Press
ISBN 13 : 9781493990566
Total Pages : 271 pages
Book Rating : 4.9/5 (95 download)

DOWNLOAD NOW!


Book Synopsis Computational Methods for Single-Cell Data Analysis by : Guo-Cheng Yuan

Download or read book Computational Methods for Single-Cell Data Analysis written by Guo-Cheng Yuan and published by Humana Press. This book was released on 2019-02-14 with total page 271 pages. Available in PDF, EPUB and Kindle. Book excerpt: This detailed book provides state-of-art computational approaches to further explore the exciting opportunities presented by single-cell technologies. Chapters each detail a computational toolbox aimed to overcome a specific challenge in single-cell analysis, such as data normalization, rare cell-type identification, and spatial transcriptomics analysis, all with a focus on hands-on implementation of computational methods for analyzing experimental data. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Computational Methods for Single-Cell Data Analysis aims to cover a wide range of tasks and serves as a vital handbook for single-cell data analysis.

Computational Methods for Next Generation Sequencing Data Analysis

Download Computational Methods for Next Generation Sequencing Data Analysis PDF Online Free

Author :
Publisher : John Wiley & Sons
ISBN 13 : 1118169484
Total Pages : 460 pages
Book Rating : 4.1/5 (181 download)

DOWNLOAD NOW!


Book Synopsis Computational Methods for Next Generation Sequencing Data Analysis by : Ion Mandoiu

Download or read book Computational Methods for Next Generation Sequencing Data Analysis written by Ion Mandoiu and published by John Wiley & Sons. This book was released on 2016-10-03 with total page 460 pages. Available in PDF, EPUB and Kindle. Book excerpt: Introduces readers to core algorithmic techniques for next-generation sequencing (NGS) data analysis and discusses a wide range of computational techniques and applications This book provides an in-depth survey of some of the recent developments in NGS and discusses mathematical and computational challenges in various application areas of NGS technologies. The 18 chapters featured in this book have been authored by bioinformatics experts and represent the latest work in leading labs actively contributing to the fast-growing field of NGS. The book is divided into four parts: Part I focuses on computing and experimental infrastructure for NGS analysis, including chapters on cloud computing, modular pipelines for metabolic pathway reconstruction, pooling strategies for massive viral sequencing, and high-fidelity sequencing protocols. Part II concentrates on analysis of DNA sequencing data, covering the classic scaffolding problem, detection of genomic variants, including insertions and deletions, and analysis of DNA methylation sequencing data. Part III is devoted to analysis of RNA-seq data. This part discusses algorithms and compares software tools for transcriptome assembly along with methods for detection of alternative splicing and tools for transcriptome quantification and differential expression analysis. Part IV explores computational tools for NGS applications in microbiomics, including a discussion on error correction of NGS reads from viral populations, methods for viral quasispecies reconstruction, and a survey of state-of-the-art methods and future trends in microbiome analysis. Computational Methods for Next Generation Sequencing Data Analysis: Reviews computational techniques such as new combinatorial optimization methods, data structures, high performance computing, machine learning, and inference algorithms Discusses the mathematical and computational challenges in NGS technologies Covers NGS error correction, de novo genome transcriptome assembly, variant detection from NGS reads, and more This text is a reference for biomedical professionals interested in expanding their knowledge of computational techniques for NGS data analysis. The book is also useful for graduate and post-graduate students in bioinformatics.

Benchmarking Statistical and Machine-Learning Methods for Single-cell RNA Sequencing Data

Download Benchmarking Statistical and Machine-Learning Methods for Single-cell RNA Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 203 pages
Book Rating : 4.:/5 (129 download)

DOWNLOAD NOW!


Book Synopsis Benchmarking Statistical and Machine-Learning Methods for Single-cell RNA Sequencing Data by : Nan Xi

Download or read book Benchmarking Statistical and Machine-Learning Methods for Single-cell RNA Sequencing Data written by Nan Xi and published by . This book was released on 2021 with total page 203 pages. Available in PDF, EPUB and Kindle. Book excerpt: The large-scale, high-dimensional, and sparse single-cell RNA sequencing (scRNA-seq) data have raised great challenges in the pipeline of data analysis. A large number of statistical and machine learning methods have been developed to analyze scRNA-seq data and answer related scientific questions. Although different methods claim advantages in certain circumstances, it is difficult for users to select appropriate methods for their analysis tasks. Benchmark studies aim to provide recommendations for method selection based on an objective, accurate, and comprehensive comparison among cutting-edge methods. They can also offer suggestions for further methodological development through massive evaluations conducted on real data. In Chapter 2, we conduct the first, systematic benchmark study of nine cutting-edge computational doublet-detection methods. In scRNA-seq, doublets form when two cells are encapsulated into one reaction volume by chance. The existence of doublets, which appear as but are not real cells, is a key confounder in scRNA-seq data analysis. Computational methods have been developed to detect doublets in scRNA-seq data; however, the scRNA-seq field lacks a comprehensive benchmarking of these methods, making it difficult for researchers to choose an appropriate method for their specific analysis needs. Our benchmark study compares doublet-detection methods in terms of their detection accuracy under various experimental settings, impacts on downstream analyses, and computational efficiency. Our results show that existing methods exhibited diverse performance and distinct advantages in different aspects. In Chapter 3, we develop an R package DoubletCollection to integrate the installation and execution of different doublet-detection methods. Traditional benchmark studies can be quickly out-of-date due to their static design and the rapid growth of available methods. DoubletCollection addresses this issue in benchmarking doublet-detection methods for scRNA-seq data. DoubletCollection provides a unified interface to perform and visualize downstream analysis after doublet-detection. Additionally, we created a protocol using DoubletCollection to execute and benchmark doublet-detection methods. This protocol can automatically accommodate new doublet-detection methods in the fast-growing scRNA-seq field. In Chapter 4, we conduct the first comprehensive empirical study to explore the best modeling strategy for autoencoder-based imputation methods specific to scRNA-seq data. The autoencoder-based imputation method is a family of promising methods to denoise sparse scRNA-seq data; however, the design of autoencoders has not been formally discussed in the literature. Current autoencoder-based imputation methods either borrow the practice from other fields or design the model on an ad hoc basis. We find that the method performance is sensitive to the key hyperparameter of autoencoders, including architecture, activation function, and regularization. Their optimal settings on scRNA-seq are largely different from those on other data types. Our results emphasize the importance of exploring hyperparameter space in such complex and flexible methods. Our work also points out the future direction of improving current methods.

Statistical Methods for RNA-sequencing Data

Download Statistical Methods for RNA-sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (123 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for RNA-sequencing Data by : Rhonda Bacher

Download or read book Statistical Methods for RNA-sequencing Data written by Rhonda Bacher and published by . This book was released on 2017 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Major methodological and technological advances in sequencing have inspired ambitious biological questions that were previously elusive. Addressing such questions with novel and complex data requires statistically rigorous tools. In this dissertation, I develop, evaluate, and apply statistical and computational methods for analysis of high-throughput sequencing data. A unifying theme of this work is that all these methods are aimed at RNA-seq data. The first method focuses on characterizing gene expression in RNA-seq experiments with ordered conditions. The second focuses on single-cell RNA-seq data, where we develop a method for normalization to account for a previously unknown technical artifact in the data. Finally, we develop a simulation in order to recapitulate the source of the artifact [in silico].

Computational Methods for the Analysis of Genomic Data and Biological Processes

Download Computational Methods for the Analysis of Genomic Data and Biological Processes PDF Online Free

Author :
Publisher : MDPI
ISBN 13 : 3039437712
Total Pages : 222 pages
Book Rating : 4.0/5 (394 download)

DOWNLOAD NOW!


Book Synopsis Computational Methods for the Analysis of Genomic Data and Biological Processes by : Francisco A. Gómez Vela

Download or read book Computational Methods for the Analysis of Genomic Data and Biological Processes written by Francisco A. Gómez Vela and published by MDPI. This book was released on 2021-02-05 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Statistical Methods for Bulk and Single-cell RNA Sequencing Data

Download Statistical Methods for Bulk and Single-cell RNA Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 207 pages
Book Rating : 4.:/5 (11 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for Bulk and Single-cell RNA Sequencing Data by : Wei Li

Download or read book Statistical Methods for Bulk and Single-cell RNA Sequencing Data written by Wei Li and published by . This book was released on 2019 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies on bulk tissues. Recently, the emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at a single-cell resolution, providing a chance to characterize stochastic heterogeneity within a cell population. The analysis of bulk and single-cell RNA-seq data at four different levels (samples, genes, transcripts, and exons) involves multiple statistical and computational questions, some of which remain challenging up to date. The first part of this dissertation focuses on the statistical challenges in the transcript-level analysis of bulk RNA-seq data. The next-generation RNA-seq technologies have been widely used to assess full-length RNA isoform structure and abundance in a high-throughput manner, enabling us to better understand the alternative splicing process and transcriptional regulation mechanism. However, accurate isoform identification and quantification from RNA-seq data are challenging due to the information loss in sequencing experiments. In Chapter 2, given the fast accumulation of multiple RNA-seq datasets from the same biological condition, we develop a statistical method, MSIQ, to achieve more accurate isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. The MSIQ method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples and allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy of MSIQ compared with alternative methods through both simulation and real data studies. In Chapter 3, we introduce a novel method, AIDE, the first approach that directly controls false isoform discoveries by implementing the statistical model selection principle. Solving the isoform discovery problem in a stepwise manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. Our results demonstrate that AIDE has the highest precision compared to the state-of-the-art methods, and it is able to identify isoforms with biological functions in pathological conditions. The second part of this dissertation discusses two statistical methods to improve scRNA-seq data analysis, which is complicated by the excess missing values, the so-called dropouts due to low amounts of mRNA sequenced within individual cells. In Chapter 5, we introduce scImpute, a statistical method to accurately and robustly impute the dropouts in scRNA-seq data. The scImpute method automatically identifies likely dropouts, and only performs imputation on these values by borrowing information across similar cells. Evaluation based on both simulated and real scRNA-seq data suggests that scImpute is an effective tool to recover transcriptome dynamics masked by dropouts, enhance the clustering of cell subpopulations, and improve the accuracy of differential expression analysis. In Chapter 6, we propose a flexible and robust simulator, scDesign, to optimize the choices of sequencing depth and cell number in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. It is the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings.

Computational Methods for the Analysis of Single-Cell RNA-Seq Data

Download Computational Methods for the Analysis of Single-Cell RNA-Seq Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (119 download)

DOWNLOAD NOW!


Book Synopsis Computational Methods for the Analysis of Single-Cell RNA-Seq Data by : Marmar Moussa

Download or read book Computational Methods for the Analysis of Single-Cell RNA-Seq Data written by Marmar Moussa and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Single cell transcriptional profiling is critical for understanding cellular heterogeneity and identification of novel cell types and for studying growth and development of tissues and tumors. Leveraging recent advances in single cell RNA sequencing (scRNA-Seq) technology requires novel methods that are robust to high levels of technical and biological noise and scale to datasets of millions of cells. In this work, we address several challenges in the analysis work-flow of scRNA-Seq data: First, we propose novel computational approaches for unsupervised clustering of scRNA-Seq data based on Term Frequency - Inverse Document Frequency (TF-IDF) transformation that has been successfully used in text analysis. Here, we present empirical experimental results showing that TF-IDF methods consistently outperform commonly used scRNA-Seq clustering approaches. Second, we study the so called 'drop-out' effect that is considered one of the most notable challenges in scRNA-Seq analysis, where only a fraction of the transcriptome of each cell is captured. The random nature of drop-outs, however, makes it possible to consider imputation methods as means of correcting for drop-outs. In this part we study existing scRNA-Seq imputation methods and propose a novel iterative imputation approach based on efficiently computing highly similar cells. We then present results of a comprehensive assessment of existing and proposed methods on real scRNA-Seq datasets with varying per cell sequencing depth. Third, we present a computational method for assigning and/or ordering cells based on their cell-cycle stages from scRNA-Seq. And finally, we present a web-based interactive computational work-flow for analysis and visualization of scRNA-seq data.

Statistical and Computational Methods for Analyzing High-Throughput Genomic Data

Download Statistical and Computational Methods for Analyzing High-Throughput Genomic Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 226 pages
Book Rating : 4.:/5 (858 download)

DOWNLOAD NOW!


Book Synopsis Statistical and Computational Methods for Analyzing High-Throughput Genomic Data by : Jingyi Li

Download or read book Statistical and Computational Methods for Analyzing High-Throughput Genomic Data written by Jingyi Li and published by . This book was released on 2013 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the burgeoning field of genomics, high-throughput technologies (e.g. microarrays, next-generation sequencing and label-free mass spectrometry) have enabled biologists to perform global analysis on thousands of genes, mRNAs and proteins simultaneously. Extracting useful information from enormous amounts of high-throughput genomic data is an increasingly pressing challenge to statistical and computational science. In this thesis, I will address three problems in which statistical and computational methods were used to analyze high-throughput genomic data to answer important biological questions. The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this ques- tion. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with existing deterministic isoform assembly algorithms, SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the rela- tionship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription. In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tis- sues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking com- monalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells.

Gene Expression Data Analysis

Download Gene Expression Data Analysis PDF Online Free

Author :
Publisher : CRC Press
ISBN 13 : 1000425754
Total Pages : 276 pages
Book Rating : 4.0/5 (4 download)

DOWNLOAD NOW!


Book Synopsis Gene Expression Data Analysis by : Pankaj Barah

Download or read book Gene Expression Data Analysis written by Pankaj Barah and published by CRC Press. This book was released on 2021-11-08 with total page 276 pages. Available in PDF, EPUB and Kindle. Book excerpt: Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge. Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data. Key Features: An introduction to the Central Dogma of molecular biology and information flow in biological systems A systematic overview of the methods for generating gene expression data Background knowledge on statistical modeling and machine learning techniques Detailed methodology of analyzing gene expression data with an example case study Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns Suitable for multidisciplinary researchers and practitioners in computer science and the biological sciences

Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing

Download Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 176 pages
Book Rating : 4.:/5 (122 download)

DOWNLOAD NOW!


Book Synopsis Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing by : Hui Ting Grace Yeo

Download or read book Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing written by Hui Ting Grace Yeo and published by . This book was released on 2020 with total page 176 pages. Available in PDF, EPUB and Kindle. Book excerpt: Single-cell RNA-sequencing (scRNA-seq) enables transcriptome-wide measurements of single cells at scale. As scRNA-seq datasets grow in complexity and size, more complex computational methods are required to distill raw data into biological insight. In this thesis, we introduce computational methods that enable analysis of novel scRNA-seq perturbational assays. We also develop computational models that seek to move beyond simple observations of cell states toward more complex models of underlying biological processes. In particular, we focus on cellular differentiation, which is the process by which cells acquire some specific form or function. First, we introduce barcodelet scRNA-seq (barRNA-seq), an assay which tags individual cells with RNA ‘barcodelets’ to identify them based on the treatments they receive. We apply barRNA-seq to study the effects of the combinatorial modulation of signaling pathways during early mESC differentiation toward germ layer and mesodermal fates. Using a data-driven analysis framework, we identify combinatorial signaling perturbations that drive cells toward specific fates. Second, we describe poly-adenine CRISPR gRNA-based scRNA-seq (pAC-seq), a method that enables the direct observation of guide RNAs (gRNAs) in scRNA-seq. We apply it to assess the phenotypic consequences of CRISPR/Cas9-based alterations of gene cis-regulatory regions. We find that power to detect transcriptomic effects depend on factors such as rate of mono/biallelic loss, baseline gene expression, and the number of cells per target gRNA. Third, we propose a generative model for analyzing scRNA-seq containing unwanted sources of variation. Using only weak supervision from a control population, we show that the model enables removal of nuisance effects from the learned representation without prior knowledge of the confounding factors. Finally, we develop a generative modeling framework that learns an underlying differentiation landscape from population-level time-series data. We validate the modeling framework on an experimental lineage tracing dataset, and show that it is able to recover the expected effects of known modulators of cell fate in hematopoiesis.

Statistical and Computational Methods for Microbiome Multi-Omics Data

Download Statistical and Computational Methods for Microbiome Multi-Omics Data PDF Online Free

Author :
Publisher : Frontiers Media SA
ISBN 13 : 2889660915
Total Pages : 170 pages
Book Rating : 4.8/5 (896 download)

DOWNLOAD NOW!


Book Synopsis Statistical and Computational Methods for Microbiome Multi-Omics Data by : Himel Mallick

Download or read book Statistical and Computational Methods for Microbiome Multi-Omics Data written by Himel Mallick and published by Frontiers Media SA. This book was released on 2020-11-19 with total page 170 pages. Available in PDF, EPUB and Kindle. Book excerpt: This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Statistical Methods for the Analysis of Genomic Data

Download Statistical Methods for the Analysis of Genomic Data PDF Online Free

Author :
Publisher : MDPI
ISBN 13 : 3039361406
Total Pages : 136 pages
Book Rating : 4.0/5 (393 download)

DOWNLOAD NOW!


Book Synopsis Statistical Methods for the Analysis of Genomic Data by : Hui Jiang

Download or read book Statistical Methods for the Analysis of Genomic Data written by Hui Jiang and published by MDPI. This book was released on 2020-12-29 with total page 136 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent years, technological breakthroughs have greatly enhanced our ability to understand the complex world of molecular biology. Rapid developments in genomic profiling techniques, such as high-throughput sequencing, have brought new opportunities and challenges to the fields of computational biology and bioinformatics. Furthermore, by combining genomic profiling techniques with other experimental techniques, many powerful approaches (e.g., RNA-Seq, Chips-Seq, single-cell assays, and Hi-C) have been developed in order to help explore complex biological systems. As a result of the increasing availability of genomic datasets, in terms of both volume and variety, the analysis of such data has become a critical challenge as well as a topic of great interest. Therefore, statistical methods that address the problems associated with these newly developed techniques are in high demand. This book includes a number of studies that highlight the state-of-the-art statistical methods for the analysis of genomic data and explore future directions for improvement.

Revealing Translational and Fundamental Insights Via Computational Analysis of Single-cell Sequencing Data

Download Revealing Translational and Fundamental Insights Via Computational Analysis of Single-cell Sequencing Data PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (14 download)

DOWNLOAD NOW!


Book Synopsis Revealing Translational and Fundamental Insights Via Computational Analysis of Single-cell Sequencing Data by : Jessica Lu Zhou

Download or read book Revealing Translational and Fundamental Insights Via Computational Analysis of Single-cell Sequencing Data written by Jessica Lu Zhou and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Single-cell sequencing has emerged as a powerful tool for dissecting cellular heterogeneity and providing cell type-specific biological insights. Single-cell sequencing technologies have rapidly proliferated over the last decade, leading to an explosion of data generated from such experiments. However, several challenges exist in the computational analysis of single-cell sequencing data due to its large and complex nature, including the need for sophisticated statistical methods to distinguish biologically meaningful signals from noise, the integration of single-cell sequencing data with other types of biological information, and the development of scalable and reproducible computational pipelines that can handle the large and complex nature of the data. In this dissertation, I present two distinct projects analyzing single-cell sequencing data. The first is of an analytical nature and tackles a translational question. In this project, I built computational pipelines for processing and analyzing single-nucleus RNA- and ATAC-sequencing datasets generated from the amygdalae of genetically diverse heterogenous stock rats, which were subjected to a behavioral protocol for studying addiction-like behaviors following cocaine self-administration. In doing so, I provide a standard reference for analyzing such data as well as reveal cell type-specific insights into the molecular underpinnings of cocaine addiction. The second project is oriented towards methods development and seeks to understand the fundamental biological question of transcriptional regulation. Here, I developed a statistical framework for simulating and modeling data from single-cell CRISPR regulatory screens and used it to perform a genome-wide interrogation of epistatic-like interactions between enhancer pairs. I found that multiple enhancers act together in a multiplicative fashion with little evidence for interactive effects between them. This work revealed novel insights into the collective behavior of multiple regulatory elements and provides a tool that can be applied to future datasets generated from such experiments. This dissertation exemplifies how computational methods can be applied in different contexts to extract meaning from a variety of single-cell sequencing modalities. By tackling both a translational and fundamental biological question, I have showcased the breadth of what can be revealed by studying single-cell sequencing data and the computational methods necessary to extract this information.

Bioinformatics Analysis of Single Cell Sequencing Data and Applications in Precision Medicine

Download Bioinformatics Analysis of Single Cell Sequencing Data and Applications in Precision Medicine PDF Online Free

Author :
Publisher : Frontiers Media SA
ISBN 13 : 2889635287
Total Pages : 136 pages
Book Rating : 4.8/5 (896 download)

DOWNLOAD NOW!


Book Synopsis Bioinformatics Analysis of Single Cell Sequencing Data and Applications in Precision Medicine by : Jialiang Yang

Download or read book Bioinformatics Analysis of Single Cell Sequencing Data and Applications in Precision Medicine written by Jialiang Yang and published by Frontiers Media SA. This book was released on 2020-02-27 with total page 136 pages. Available in PDF, EPUB and Kindle. Book excerpt: