A Model Based Framework For High Throughput Genomic Data Enhancement

Download A Model Based Framework For High Throughput Genomic Data Enhancement full books in PDF, epub, and Kindle. Read online A Model Based Framework For High Throughput Genomic Data Enhancement ebook anywhere anytime directly on your device. Fast Download speed and no annoying ads. We cannot guarantee that every ebooks is available!

A Model-based Framework for High-throughput Genomic Data Enhancement

Author : Dongrui Zhong
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (133 download)

DOWNLOAD NOW!

Book Synopsis A Model-based Framework for High-throughput Genomic Data Enhancement by : Dongrui Zhong

Download or read book A Model-based Framework for High-throughput Genomic Data Enhancement written by Dongrui Zhong and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. We developed a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We found that when using predicted data to conduct other analyses, it is not optimal to use all the predicted data. Instead, one should only use the subset of data, which can be predicted accurately. We proposed a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we showed that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses. The performance of conventional machine learning methods relies greatly on the quality of the data. In the field of bioinformatics, researchers often need to work with huge amount of data from different resources, such as Gene Expression Omnibus (GEO) and the Cancer Genome Atlas Program (TCGA). It is possible that gene expression data collected in different studies are somewhat inconsistent, for reasons such as the difference in the experiment design or the platform (different microarray or RNA-seq platforms) used to read the gene expression data. This brings difficulty in cross-study analysis, such as model transferability among studies. Some normalization methods such as quantile normalization and rank normalization are widely used to deal with such inconsistency. However, models trained using normalized data could still suffer from extreme values or large variation of errors in the original data, since conventional normalization methods may not completely overcome these issues. We propose to use the pairwise rank information among the gene expression values (paired predictive variables, or PPV) calculated from the original gene expression values to train machine learning models. Our result shows that PPV gives more statistically robust models, while the interpretability of variables is greatly maintained, and the model performance is equivalent or better than models trained using original gene expression data with conventional normalization methods. The raw gene expression values can be decomposed into four components: the true value, systematic errors, random errors, and unexpected errors (producing outliers when large enough). Normalization methods are used to remove systematic errors, and outlier detection methods can be applied to detect outliers. Few, if at all, methods have been developed to remove random errors. In this study, we developed an efficient method based on probabilistic principal component analysis (PPCA) to remove both random and unexpected errors in gene expression data by borrowing information across genes and samples using an internal prediction strategy. The proposed method, gene-expression-data-enhancement (GEDE), first estimates a covariance matrix for all the genes using an existing dataset with a relatively large number of samples. It then infers enhanced gene expressions for a dataset to be analyzed by predicting gene expression values using expression values of other genes. We showed that the enhanced version of a gene expression profile has higher quality than the original profile using both simulation study and real data analysis. In real data analysis, we showed that enhanced gene expression profiles gave better differential gene expression analysis results than original profiles. We recommend gene expression enhancement as an essential step in gene expression data analysis pipeline.

Scalable Optimization Algorithms for High-throughput Genomic Data

Author :
Publisher :
ISBN 13 :
Total Pages : 222 pages
Book Rating : 4.:/5 (913 download)

DOWNLOAD NOW!

Book Synopsis Scalable Optimization Algorithms for High-throughput Genomic Data by :

Download or read book Scalable Optimization Algorithms for High-throughput Genomic Data written by and published by . This book was released on 2015 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Bioinformatics

Author : Zoé Lacroix
Publisher : Academic Press
ISBN 13 : 155860829X
Total Pages : 466 pages
Book Rating : 4.5/5 (586 download)

DOWNLOAD NOW!

Book Synopsis Bioinformatics by : Zoé Lacroix

Download or read book Bioinformatics written by Zoé Lacroix and published by Academic Press. This book was released on 2003-07-18 with total page 466 pages. Available in PDF, EPUB and Kindle. Book excerpt: The heart of the book lies in the collaboration efforts of eight distinct bioinformatics teams that describe their own unique approaches to data integration and interoperability. Each system receives its own chapter where the lead contributors provide precious insight into the specific problems being addressed by the system, why the particular architecture was chosen, and details on the system's strengths and weaknesses. In closing, the editors provide important criteria for evaluating these systems that bioinformatics professionals will find valuable. * Provides a clear overview of the state-of-the-art in data integration and interoperability in genomics, highlighting a variety of systems and giving insight into the strengths and weaknesses of their different approaches.-

Bayesian Hierarchical Modeling of High-throughput Genomic Data with Applications to Cancer Bioinformatics and Stem Cell Differentiation

Download Bayesian Hierarchical Modeling of High-throughput Genomic Data with Applications to Cancer Bioinformatics and Stem Cell Differentiation PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 278 pages
Book Rating : 4.:/5 (941 download)

DOWNLOAD NOW!

Book Synopsis Bayesian Hierarchical Modeling of High-throughput Genomic Data with Applications to Cancer Bioinformatics and Stem Cell Differentiation by :

Download or read book Bayesian Hierarchical Modeling of High-throughput Genomic Data with Applications to Cancer Bioinformatics and Stem Cell Differentiation written by and published by . This book was released on 2015 with total page 278 pages. Available in PDF, EPUB and Kindle. Book excerpt: Advances in the ability to obtain genomic measurements have continually outpaced advances in the ability to interpret them in a statistically rigorous manner. In this dissertation, I develop, evaluate, and apply Bayesian hierarchical modeling frameworks to uncover novel insights in cancer bioinformatics as well as explore and characterize stem cell expression heterogeneity. The first framework integrates diverse sets of genomic information to identify cancer patient subgroups. The recently developed survLDA (survival-supervised latent Dirichlet allocation) model is able to capture patient heterogeneity as well as incorporate many diverse data types, but the potential in utilizing the model for predictive inference has yet to be explored. This is evaluated empirically and under simulation studies to show that in order to accurately identify patient subgroups, the necessary sample size depends on the size of the model being used (number of topics), the size of each patient's document, and the number of patients considered. The second framework is a Model-based Approach for identifying Driver Genes in Cancer (MADGiC), which infers causal genes in cancer based on somatic mutation profiles. The model takes advantage of external data sources regarding background mutation rates and the potential for specific mutations to result in functional consequences. In addition, it leverages information about key mutational patterns that are typical of driver genes. As such, MADGiC encodes valuable prior information in a novel manner and incorporates several key sources of information that were previously only considered in isolation. This results in improved inference of driver genes, as demonstrated in simulation and case studies. Finally, the third framework identifies genes that exhibit differential regulation of expression at the single-cell level. Specifically, it is known that gene expression often occurs in a stochastic, bursty manner. When profiling across many cells, these bursty gene expression patterns may be exhibited by multimodal distributions. Identifying these bursty expression patterns as well as detecting differences across biological conditions, which may represent differential regulation, is an important first step in many single-cell experiments. We develop a Bayesian nonparametric mixture modeling approach that explicitly accounts for these multimodal patterns and demonstrate its utility using simulation and case studies.

Transformation of High-Throughput Data Into Hierarchical Cellular Models Enables Biological Prediction and Discovery

Author : Michael Harris Kramer
Publisher :
ISBN 13 :
Total Pages : 128 pages
Book Rating : 4.:/5 (951 download)

DOWNLOAD NOW!

Book Synopsis Transformation of High-Throughput Data Into Hierarchical Cellular Models Enables Biological Prediction and Discovery by : Michael Harris Kramer

Download or read book Transformation of High-Throughput Data Into Hierarchical Cellular Models Enables Biological Prediction and Discovery written by Michael Harris Kramer and published by . This book was released on 2016 with total page 128 pages. Available in PDF, EPUB and Kindle. Book excerpt: A holy grail of bioinformatics is the creation of whole-cell models with the ability to enhance human understanding and facilitate discovery. To this end, a successful and widely-used effort is the Gene Ontology (GO), a massive project to manually annotate genes into terms describing molecular functions, biological processes and cellular components and provide relations between terms, e.g. capturing that "small ribosomal subunit" and "large ribosomal subunit" come together to make "ribosome". GO is widely used to understand the function of a gene or group of genes. Unfortunately, GO is limited by the effort required to create and update it by hand. It exists only for well-studied organisms and even then in one, generic form per organism with limited overall genome coverage and bias towards well-studied genes and functions. It is not possible to learn about an uncharacterized gene or discover a new function using GO, and one cannot quickly assemble an ontology model for a new organism, cell-type or disease-state. Here we change this state of affairs by developing and utilizing the concept of purely data-driven gene ontologies. In chapter two, we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to computationally infer a data-driven ontology whose coverage and power are equivalent to those of the manually-curated GO. In chapter three we further develop the algorithmic foundations for data-driven ontologies, laying the groundwork for machine learning to intelligently integrate many types of experimental data into ontology models. In chapter four, we focus on a cellular process (autophagy in Saccharomyces cerevisiae) and develop a framework (Active Interaction Mapping) which guides experimental selection, systematically improves an existing process-specific ontology model and uncovers new autophagy biology. Finally, in chapter five, we illustrate the power of hierarchical whole-cell ontology models for biological modeling by demonstrating an ontology-based framework for translation of genotype to phenotype. Overall, this work provides a roadmap to construct data-driven, hierarchical models of gene function for the whole cell or a specific cellular process and illustrates the power of these models for both discovery of new biology and biological modeling.

Effective Screening and Joint Modeling for High-Throughput Genomic Data Analysis

Author : Jacob Davis Rhyne
Publisher :
ISBN 13 :
Total Pages : 135 pages
Book Rating : 4.:/5 (11 download)

DOWNLOAD NOW!

Book Synopsis Effective Screening and Joint Modeling for High-Throughput Genomic Data Analysis by : Jacob Davis Rhyne

Download or read book Effective Screening and Joint Modeling for High-Throughput Genomic Data Analysis written by Jacob Davis Rhyne and published by . This book was released on 2019 with total page 135 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Analysis of High-throughput Genomic Data

Author : Yen-Tsung Huang
Publisher :
ISBN 13 :
Total Pages : 368 pages
Book Rating : 4.:/5 (795 download)

DOWNLOAD NOW!

Book Synopsis Analysis of High-throughput Genomic Data by : Yen-Tsung Huang

Download or read book Analysis of High-throughput Genomic Data written by Yen-Tsung Huang and published by . This book was released on 2012 with total page 368 pages. Available in PDF, EPUB and Kindle. Book excerpt:

High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain, Volume II

Author : Andreas Hund
Publisher : Frontiers Media SA
ISBN 13 : 2832545459
Total Pages : 161 pages
Book Rating : 4.8/5 (325 download)

DOWNLOAD NOW!

Book Synopsis High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain, Volume II by : Andreas Hund

Download or read book High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain, Volume II written by Andreas Hund and published by Frontiers Media SA. This book was released on 2024-03-01 with total page 161 pages. Available in PDF, EPUB and Kindle. Book excerpt: This Research Topic is part of the High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain series. The discipline of “High Throughput Field Phenotyping” (HTFP) has gained momentum in the last decade. HTFP includes a wide range of disciplines such as plant science, agronomy, remote sensing, and genetics; as well as biochemistry, imaging, computation, agricultural engineering, and robotics. High throughput technologies have substantially increased our ability to monitor and quantify field experiments and breeding nurseries at multiple scales. HTFP technology can not only rapidly and cost-effectively replace tedious and subjective ratings in the field, but can also unlock the potential of new, latent phenotypes representing underlying biological function. These advances have also provided the ability to follow crop growth and development across seasons at high and previously inaccessible spatial and temporal resolutions. By combining these data with measurements of all environmental factors affecting plant growth and yield (“Envirotyping”), genotypic-specific reaction norms and phenotypic plasticity may be elucidated.

Statistical and Computational Methods for Analyzing High-Throughput Genomic Data

Author : Jingyi Li
Publisher :
ISBN 13 :
Total Pages : 226 pages
Book Rating : 4.:/5 (858 download)

DOWNLOAD NOW!

Book Synopsis Statistical and Computational Methods for Analyzing High-Throughput Genomic Data by : Jingyi Li

Download or read book Statistical and Computational Methods for Analyzing High-Throughput Genomic Data written by Jingyi Li and published by . This book was released on 2013 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the burgeoning field of genomics, high-throughput technologies (e.g. microarrays, next-generation sequencing and label-free mass spectrometry) have enabled biologists to perform global analysis on thousands of genes, mRNAs and proteins simultaneously. Extracting useful information from enormous amounts of high-throughput genomic data is an increasingly pressing challenge to statistical and computational science. In this thesis, I will address three problems in which statistical and computational methods were used to analyze high-throughput genomic data to answer important biological questions. The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this ques- tion. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with existing deterministic isoform assembly algorithms, SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the rela- tionship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription. In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tis- sues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking com- monalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells.

Integrative Approaches for Mining High-throughput Genomic Data

Author : Kenneth Daily
Publisher :
ISBN 13 : 9781124988917
Total Pages : 115 pages
Book Rating : 4.9/5 (889 download)

DOWNLOAD NOW!

Book Synopsis Integrative Approaches for Mining High-throughput Genomic Data by : Kenneth Daily

Download or read book Integrative Approaches for Mining High-throughput Genomic Data written by Kenneth Daily and published by . This book was released on 2011 with total page 115 pages. Available in PDF, EPUB and Kindle. Book excerpt: The study of transcriptional regulation encompasses many fields in molecular, cell, evolu- tionary, and computational biology. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically discover and map all these elements. High-throughput techniques have made genome-wide assays standard for the analysis of mechanisms of regulation, and the amount of data available for analysis is increasing exponentially. Computational techniques have been developed in tandem to pro- cess, synthesize, index, and store these datasets. We describe here results from various levels of the study of transcriptional regulation and the methods developed to facilitate analysis. First, we develop and improve a pipeline (termed MotifMap) for the search, storage, and integration of transcription factor binding sites in the species of multiple model organisms. We employ a phylogenetic footprinting approach to reducing the number of false positive sites reported, and evaluate the performance using high-throughput sequencing datasets for a number of transcription factors. Next, we employ this pipeline in conjunction with high- throughput sequencing data in a study to annotate retrotransposon insertion sites across the yeast genome. Specific elements are observed proximal to these insertion sites, and their identification is aided by the MotifMap pipeline. Lastly, we describe techniques to compress and store high-throughput sequencing data. Our algorithm's performance is comparable to standard compression techniques, while maintaining the ability to use the data for analysis.

The OME Framework for Genome-scale Systems Biology

Author :
Publisher :
ISBN 13 :
Total Pages : 23 pages
Book Rating : 4.:/5 (953 download)

DOWNLOAD NOW!

Book Synopsis The OME Framework for Genome-scale Systems Biology by :

Download or read book The OME Framework for Genome-scale Systems Biology written by and published by . This book was released on 2014 with total page 23 pages. Available in PDF, EPUB and Kindle. Book excerpt: The life sciences are undergoing continuous and accelerating integration with computational and engineering sciences. The biology that many in the field have been trained on may be hardly recognizable in ten to twenty years. One of the major drivers for this transformation is the blistering pace of advancements in DNA sequencing and synthesis. These advances have resulted in unprecedented amounts of new data, information, and knowledge. Many software tools have been developed to deal with aspects of this transformation and each is sorely needed [1-3]. However, few of these tools have been forced to deal with the full complexity of genome-scale models along with high throughput genome- scale data. This particular situation represents a unique challenge, as it is simultaneously necessary to deal with the vast breadth of genome-scale models and the dizzying depth of high-throughput datasets. It has been observed time and again that as the pace of data generation continues to accelerate, the pace of analysis significantly lags behind [4]. It is also evident that, given the plethora of databases and software efforts [5-12], it is still a significant challenge to work with genome-scale metabolic models, let alone next-generation whole cell models [13-15]. We work at the forefront of model creation and systems scale data generation [16-18]. The OME Framework was borne out of a practical need to enable genome-scale modeling and data analysis under a unified framework to drive the next generation of genome-scale biological models. Here we present the OME Framework. It exists as a set of Python classes. However, we want to emphasize the importance of the underlying design as an addition to the discussions on specifications of a digital cell. A great deal of work and valuable progress has been made by a number of communities [13, 19-24] towards interchange formats and implementations designed to achieve similar goals. While many software tools exist for handling genome-scale metabolic models or for genome-scale data analysis, no implementations exist that explicitly handle data and models concurrently. The OME Framework structures data in a connected loop with models and the components those models are composed of. This results in the first full, practical implementation of a framework that can enable genome-scale design-build-test. Over the coming years many more software packages will be developed and tools will necessarily change. However, we hope that the underlying designs shared here can help to inform the design of future software.

Utilizing Genome-scale Models to Enhance High-throughput Data Analysis

Author : Aarash Bordbar
Publisher :
ISBN 13 : 9781303868894
Total Pages : 226 pages
Book Rating : 4.8/5 (688 download)

DOWNLOAD NOW!

Book Synopsis Utilizing Genome-scale Models to Enhance High-throughput Data Analysis by : Aarash Bordbar

Download or read book Utilizing Genome-scale Models to Enhance High-throughput Data Analysis written by Aarash Bordbar and published by . This book was released on 2014 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: Technological advances have revolutionized the life sciences. The cost of biological data generation has decreased exponentially in the past decade allowing simultaneous measurement of various biomolecules, which has enabled biologists to study cells as systems of interacting components. However, it is becoming clear that the true bottleneck to biological understanding is not data generation, but data analysis, leaving many data sets incompletely analyzed. To cope with this data deluge, diverse and sophisticated data analysis techniques have been developed and implemented. In this thesis, methodological advances for genome-scale metabolic models are developed and applied to interpret high-throughput data sets to better understand cellular processes at the pathway, network, and dynamic levels. First, an un-biased, algorithmic approach to enumerate biological pathways is presented. The computed pathways are more consistent with the biomolecular interactions of the associated proteins and genes than even classical, human-defined pathways. Further, the computed pathways are utilized to discover novel and non-intuitive transcriptional regulatory interactions in Escherichia coli. Second, mRNA and protein expression omic data are used to build genome-scale metabolic models of the human alveolar macrophage and the murine macrophage-like cell line, RAW 264.7. Model simulations and prospective high-throughput experiments determined metabolism's role in macrophages during infection and inflammation at the network level. Finally, personalized data-driven kinetic models of human erythrocyte metabolism are presented. Tailored to 24 individuals' plasma and intracellular metabolite levels, the kinetic models better represent the underlying genetic makeup of the individuals than the metabolite levels alone. Further, the personalized models were used to predict and interpret side effects on an individual basis.

Quantitative Integration of Biological Knowledge for the Analysis of High-throughput Genomic Data

Author : Thomas William Chittenden
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (824 download)

DOWNLOAD NOW!

Book Synopsis Quantitative Integration of Biological Knowledge for the Analysis of High-throughput Genomic Data by : Thomas William Chittenden

Download or read book Quantitative Integration of Biological Knowledge for the Analysis of High-throughput Genomic Data written by Thomas William Chittenden and published by . This book was released on 2012 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The development of high-throughput technologies has changed the way in which we approach questions in biology by allowing us to assess the relative state of tens of thousands of genes or gene products in a single assay. A great deal of research has focused on developing statistical methods to identify biologically relevant sets of genes whose collective state correlates with a given phenotype under study. However, placing these gene sets into an intellectual framework that allows for hypothesis generation and mechanistic interpretation remains a significant challenge. To address these issues, we first apply and then extend a well-established gene ontology, singular enrichment analysis method to quantitatively assess overrepresented biological themes within lists of somatically mutated and abnormally expressed genes from publically available human breast, colorectal, lung, prostate, and renal cancer datasets. We further validate the utility of this novel approach with actual experimental laboratory investigations. Finally, we describe a general strategy for constructing prediction models by integrating prior biological knowledge with gene expression data from three large human breast cancer datasets. We show how this biological network-based model improves performance and interoperability by identifying genes more closely related to breast cancer etiology and patient survival. The work presented throughout this manuscript indicates the utility and proposes the future development of such methodologies to address many of the contemporary concerns associated with the analysis of a wide array of high-dimensional genomic data types.

Enhanced biological mechanism study, drug discovery and individualized medicine with single-cell multiomics data and integrative analysis

Download Enhanced biological mechanism study, drug discovery and individualized medicine with single-cell multiomics data and integrative analysis PDF Online Free

Author : Panwen Wang
Publisher : Frontiers Media SA
ISBN 13 : 2832529658
Total Pages : 126 pages
Book Rating : 4.8/5 (325 download)

DOWNLOAD NOW!

Book Synopsis Enhanced biological mechanism study, drug discovery and individualized medicine with single-cell multiomics data and integrative analysis by : Panwen Wang

Download or read book Enhanced biological mechanism study, drug discovery and individualized medicine with single-cell multiomics data and integrative analysis written by Panwen Wang and published by Frontiers Media SA. This book was released on 2023-07-20 with total page 126 pages. Available in PDF, EPUB and Kindle. Book excerpt:

High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain

Author : Urs Schmidhalter
Publisher : Frontiers Media SA
ISBN 13 : 2889711595
Total Pages : 399 pages
Book Rating : 4.8/5 (897 download)

DOWNLOAD NOW!

Book Synopsis High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain by : Urs Schmidhalter

Download or read book High-Throughput Field Phenotyping to Advance Precision Agriculture and Enhance Genetic Gain written by Urs Schmidhalter and published by Frontiers Media SA. This book was released on 2021-08-10 with total page 399 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computational Tools for the Analysis of High-throughput Genome-scale Sequence Data

Author : David Adrian Lopez
Publisher :
ISBN 13 :
Total Pages : 83 pages
Book Rating : 4.:/5 (17 download)

DOWNLOAD NOW!

Book Synopsis Computational Tools for the Analysis of High-throughput Genome-scale Sequence Data by : David Adrian Lopez

Download or read book Computational Tools for the Analysis of High-throughput Genome-scale Sequence Data written by David Adrian Lopez and published by . This book was released on 2016 with total page 83 pages. Available in PDF, EPUB and Kindle. Book excerpt: As high-throughput sequence data becomes increasingly used in a variety of fields, there is a growing need for computational tools that facilitate analyzing and interpreting the sequence data to extract biological meaning. To date, several computational tools have been developed to analyze raw and processed sequence data in a number of contexts. However, many of these tools primarily focus on well-studied, reference organisms, and in some cases, such as the visualization of molecular signatures in expression data, there is a scarcity or complete absence of tools. Furthermore, the compendium of genome-scale data in publicly accessible databases can be leveraged to inform new studies. The focus of this dissertation is the development of computational tools and methods to analyze high-throughput genome-scale sequence data, as well as applications in mammalian, algal, and bacterial systems. Chapter 1 introduces the challenges of analyzing high-throughput sequence data. Chapter 2 presents the Signature Visualization Tool (SaVanT), a framework to visualize molecular signatures in user-generated expression data on a sample-by-sample basis. This chapter demonstrates that SaVanT can use immune activation signatures to distinguish patients with different types of acute infections (influenza A and bacterial pneumonia), and determine the primary cell types underlying different leukemias (acute myeloid and acute lymphoblastic) and skin disorders. Chapter 3 describes the Algal Functional Annotation Tool, which biologically interprets large gene lists, such as those derived from differential expression experiments. This tool integrates data from several pathway, ontology, and protein domain databases and performs enrichment testing on gene lists for several algal genomes. Chapter 4 describes a survey of the Chlamydomonas reinhardtii transcriptome and methylome across various stages of its sexual life cycle. This chapter discusses the identification and function of 361 gamete-specific and 627 zygote-specific genes, the first base-resolution methylation map of C. reinhardtii, and the changes in chloroplast methylation throughout key stages of its life cycle. Chapter 5 presents a comparative genomics approach to identifying previously uncharacterized bacterial microcompartment (BMC) proteins. Based on genomic proximity of genes in 131 fully-sequenced bacterial genomes, this chapter describes new putative microcompartments and their function.

Machine Learning for Large-scale Genomics

Author : Yifei Chen
Publisher :
ISBN 13 : 9781321448283
Total Pages : 125 pages
Book Rating : 4.4/5 (482 download)

DOWNLOAD NOW!

Book Synopsis Machine Learning for Large-scale Genomics by : Yifei Chen

Download or read book Machine Learning for Large-scale Genomics written by Yifei Chen and published by . This book was released on 2014 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genomic malformations are believed to be the driving factors of many diseases. Therefore, understanding the intrinsic mechanisms underlying the genome and informing clinical practices have become two important missions of large-scale genomic research. Recently, high-throughput molecular data have provided abundant information about the whole genome, and have popularized computational tools in genomics. However, traditional machine learning methodologies often suffer from strong limitations when dealing with high-throughput genomic data, because the latter are usually very high dimensional, highly heterogeneous, and can show complicated nonlinear effects. In this thesis, we present five new algorithms or models to address these challenges, each of which is applied to a specific genomic problem. Project 1 focuses on model selection in cancer diagnosis. We develop an efficient algorithm (ADMM-ENSVM) for the Elastic Net Support Vector Machine, which achieves simultaneous variable selection and max-margin classification. On a colon cancer diagnosis dataset, ADMM-ENSVM shows advantages over other SVM algorithms in terms of diagnostic accuracy, feature selection ability, and computational efficiency. Project 2 focuses on model selection in gene correlation analysis. We develop an efficient algorithm (SBLVGG) using the similar methodology as of ADMM-ENSVM for the Latent Variable Gaussian Graphical Model (LVGG). LVGG models the marginal concentration matrix of observed variables as a combination of a sparse matrix and a low rank one. Evaluated on a microarray dataset containing 6,316 genes, SBLVGG is notably faster than the state-of-the-art LVGG solver, and shows that most of the correlation among genes can be effectively explained by only tens of latent factors. Project 3 focuses on ensemble learning in cancer survival analysis. We develop a gradient boosting model (GBMCI), which does not explicitly assume particular forms of hazard functions, but trains an ensemble of regression trees to approximately optimize the concordance index. We benchmark the performance of GBMCI against several popular survival models on a large-scale breast cancer prognosis dataset. GBMCI consistently outperforms other methods based on a number of feature representations, which are heterogeneous and contain missing values. Project 4 focuses on deep learning in gene expression inference (GEIDN). GEIDN is a large-scale neural network, which can infer ~21k target genes jointly from ~1k landmark genes and can naturally capture hierarchical nonlinear interactions among genes. We deploy deep learning techniques (drop out, momentum training, GPU computing, etc.) to train GEIDN. On a dataset of ~129k complete human transcriptomes, GEIDN outperforms both k-nearest neighbor regression and linear regression in predicting >99.96% of the target genes. Moreover, increased network scales help to improve GEIDN, while increased training data benefits GEIDN more than other methods. Project 5 focuses on deep learning in annotating coding and noncoding genetic variants (DANN). DANN is a neural network to differentiate evolutionarily derived alleles from simulated ones with 949 highly heterogeneous features. It can capture nonlinear relationships among features. We train DANN with deep learning techniques like for GEIDN. DANN achieves a 18.90% relative reduction in the error rate and a 14.52% relative increase in the area under the curve over CADD, a state-of-the-art algorithm to annotate genetic variants based on the linear SVM.