UMass Chan Medical School Faculty Publications
UMMS Affiliation
Program in Bioinformatics and Integrative Biology; Graduate School of Biomedical Sciences
Publication Date
2021-10-12
Document Type
Article Preprint
Disciplines
Amino Acids, Peptides, and Proteins | Bioinformatics | Computational Biology
Abstract
The human genome contains roughly 1,600 transcription factors (TFs) (1), DNA-binding proteins recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX (2), and in vivo, using techniques including ChIP-seq (3, 4). We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. We will continue to expand the resource as ENCODE Phase IV data are released.
Keywords
Bioinformatics, Factorbook, transcription factors, ENCODE
Rights and Permissions
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
DOI of Published Version
10.1101/2021.10.11.463518
Source
bioRxiv 2021.10.11.463518; doi: https://doi.org/10.1101/2021.10.11.463518. Link to preprint on bioRxiv.
Related Resources
Now published in Nucleic Acids Research doi: 10.1093/nar/gkab1039
Journal/Book/Conference Title
bioRxiv
Repository Citation
Pratt HE, Andrews G, Phalke N, Purcaro MJ, van der Velde A, Moore JE, Weng Z. (2021). Factorbook: an Updated Catalog of Transcription Factor Motifs and Candidate Regulatory Motif Sites [preprint]. UMass Chan Medical School Faculty Publications. https://doi.org/10.1101/2021.10.11.463518. Retrieved from https://escholarship.umassmed.edu/faculty_pubs/2093
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Included in
Amino Acids, Peptides, and Proteins Commons, Bioinformatics Commons, Computational Biology Commons
Comments
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.