Determination and inference of eukaryotic transcription factor sequence specificity
Program in Systems Biology; Program in Molecular Medicine
Cell Biology | Computational Biology | Systems Biology
Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only approximately 1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for approximately 34% of the approximately 170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
DOI of Published Version
Cell. 2014 Sep 11;158(6):1431-43. doi: 10.1016/j.cell.2014.08.009. Link to article on publisher's site
Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano J, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget F, Ratsch G, Larrondo LF, Ecker JR, Hughes TR. (2014). Determination and inference of eukaryotic transcription factor sequence specificity. Systems Biology Publications. https://doi.org/10.1016/j.cell.2014.08.009. Retrieved from https://escholarship.umassmed.edu/sysbio_pubs/51