GSBS Dissertations and Theses



Approval Date


Document Type

Doctoral Dissertation

Academic Program

Bioinformatics and Computational Biology, MD/PhD


Program in Bioinformatics and Integrative Biology

First Thesis Advisor

Zhiping Weng, PhD


ENCODE, enhancer, regulatory element, genome, epigenome, DNase, ChIP-seq, Big Data, visualization


The goal of the Encyclopedia of DNA Elements (ENCODE) project has been to characterize all the functional elements of the human genome. These elements include expressed transcripts and genomic regions bound by transcription factors (TFs), occupied by nucleosomes, occupied by nucleosomes with modified histones, or hypersensitive to DNase I cleavage, etc. Chromatin Immunoprecipitation (ChIP-seq) is an experimental technique for detecting TF binding in living cells, and the genomic regions bound by TFs are called ChIP-seq peaks. ENCODE has performed and compiled results from tens of thousands of experiments, including ChIP-seq, DNase, RNA-seq and Hi-C.

These efforts have culminated in two web-based resources from our lab—Factorbook and SCREEN—for the exploration of epigenomic data for both human and mouse. Factorbook is a peak-centric resource presenting data such as motif enrichment and histone modification profiles for transcription factor binding sites computed from ENCODE ChIP-seq data. SCREEN provides an encyclopedia of ~2 million regulatory elements, including promoters and enhancers, identified using ENCODE ChIP-seq and DNase data, with an extensive UI for searching and visualization.

While we have successfully utilized the thousands of available ENCODE ChIP-seq experiments to build the Encyclopedia and visualizers, we have also struggled with the practical and theoretical inability to assay every possible experiment on every possible biosample under every conceivable biological scenario. We have used machine learning techniques to predict TF binding sites and enhancers location, and demonstrate machine learning is critical to help decipher functional regions of the genome.



Rights and Permissions

Licensed under a Creative Commons license

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Available for download on Friday, June 29, 2018