Computational Methods for Transcriptome-based Cellular Phenotyping

Computational Methods for Transcriptome-based Cellular Phenotyping
Author: Matthew Nathan Bernstein
Publisher:
Total Pages: 160
Release: 2019
Genre:
ISBN:

Download Computational Methods for Transcriptome-based Cellular Phenotyping Book in PDF, Epub and Kindle

Although the basic chemical mechanisms of cellular biology are now well-known, we are still a long way from understanding how phenotypes emerge from these basic mechanisms. Within the last decade, RNA-sequencing (RNA-seq) has become a ubiquitous technology for measuring the transcriptome, which provides a snapshot of gene expression across the entire genome. An improvement in our ability to predict how phenotypes emerge from the complex patterns of gene expression, a task we refer to as transcriptome-based cellular phenotyping (TBCP), would lead to considerable medical and technological advancements. Machine learning promises to be an apt approach for TBCP due to its ability to overcome noise inherent in RNA-seq data and because it does not require a priori knowledge regarding the rules and patterns that lead from gene expression to phenotype. Furthermore, there exist large, public databases of RNA-seq data that promise to be a valuable source of training data for developing machine learning algorithms to perform TBCP. Unfortunately, this opportunity is impeded by a number of challenges inherent in these databases including poorly structured metadata and data heterogeneity. In this thesis, I present three projects that push the state-of-the-art in the ability to leverage the trove of publicly available gene expression data for TBCP. In the first project, we address the problem of poorly structured metadata that exist in public genomics databases. We specifically focus on the Sequence Read Archive (SRA), which is the premiere repository of raw RNA-seq data curated by the National Institutes of Health; however, our work generalizes to other databases. Existing approaches treat metadata normalization as a named entity recognition problem where the goal is to tag metadata with terms from controlled vocabularies when that term is mentioned in the metadata. We reframe this problem as an inference task, in which we tag the metadata with only those terms that describe the underlying biology of the described sample rather than with all mentioned terms. By doing so, we achieve much higher precision than that achieved by existing methods, and maintain a competitive recall. In the second project, we leverage the normalized metadata produced by the first project in order to train predictive models of phenotype from RNA-seq derived gene expression data. We specifically focus on the cell type prediction task: given an RNA-seq sample, we wish to predict the cell type from which the sample was derived. Cell type prediction is an important step in many transcriptomic analyses, including that of annotating cell types in single-cell RNA-seq datasets. This work represents the first effort towards a cell type prediction task that utilizes the full potential of publicly available RNA-seq data. Finally, in the third project, we build on the second project in order to address the task of cell type prediction on sparse single-cell RNA-seq data (scRNA-seq) produced by novel droplet-based technologies. These droplet-based scRNA-seq technologies are enabling the sequencing of higher numbers of cells at the cost of a lower read-depth per cell. Such low read-depths result in fewer genes with detected expression per cell. We explore the effects of applying cell type classifiers trained on dense, bulk RNA-seq data to sparse scRNA-seq data and propose a novel probabilistic generative model for adapting the bulk-trained classifiers to sparse input data.


Computational Methods for Transcriptome-based Cellular Phenotyping
Language: en
Pages: 160
Authors: Matthew Nathan Bernstein
Categories:
Type: BOOK - Published: 2019 - Publisher:

GET EBOOK

Although the basic chemical mechanisms of cellular biology are now well-known, we are still a long way from understanding how phenotypes emerge from these basic
Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing
Language: en
Pages: 176
Authors: Hui Ting Grace Yeo
Categories:
Type: BOOK - Published: 2020 - Publisher:

GET EBOOK

Single-cell RNA-sequencing (scRNA-seq) enables transcriptome-wide measurements of single cells at scale. As scRNA-seq datasets grow in complexity and size, more
Computational Methods for Single-Cell Data Analysis
Language: en
Pages: 271
Authors: Guo-Cheng Yuan
Categories: Science
Type: BOOK - Published: 2019-02-14 - Publisher: Humana Press

GET EBOOK

This detailed book provides state-of-art computational approaches to further explore the exciting opportunities presented by single-cell technologies. Chapters
Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics
Language: en
Pages: 246
Authors: Fanny Perraudeau
Categories:
Type: BOOK - Published: 2018 - Publisher:

GET EBOOK

I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a
Evolution of Translational Omics
Language: en
Pages: 354
Authors: Institute of Medicine
Categories: Science
Type: BOOK - Published: 2012-09-13 - Publisher: National Academies Press

GET EBOOK

Technologies collectively called omics enable simultaneous measurement of an enormous number of biomolecules; for example, genomics investigates thousands of DN