GSBS Dissertations and Theses

Publication Date


Document Type

Doctoral Dissertation

Academic Program

Biochemistry and Molecular Pharmacology


Molecular, Cell and Cancer Biology

First Thesis Advisor

Scot Wolfe, Ph.D.


DNA, Drosophila Proteins, Homeodomain Proteins, Transcription Factors, Regulatory Elements, Transcriptional, Two-Hybrid System Techniques


From the yeast genome completed in 1996 to the 12 Drosophilagenomes published earlier this year; little more than a decade has provided an incredible amount of genomic data. Yet even with this mountain of genetic information the regulatory networks that control gene expression remain relatively undefined. In part, this is due to the enormous amount of non-coding DNA, over 98% of the human genome, which needs to be made sense of. It is also due to the large number of transcription factors, potentially 2,000 such factors in the human genome, which may contribute to any given network directly or indirectly. Certainly, one of the central limitations has been the paucity of transcription factor (TF) specificity data that would aid in the prediction of regulatory targets throughout a genome.

The general lack of specificity data has hindered the prediction of regulatory targets for individual TFs as well as groups of factors that function within a common regulatory pathway. A large collection of factor specificities would allow for the combinatorial prediction of regulatory targets that considers all factors actively expressed in a given cell, under a given condition. Herein we describe substantial improvements to a previous bacterial one-hybrid system with increased sensitivity and dynamic range that make it amenable for the high-throughput analysis of sequence-specific TFs. Currently we have characterized 108 (14.3%) of the predicted TFs in Drosophilathat fall into a broad range of DNA-binding domain families, demonstrating the feasibility of characterizing a large number of TFs using this technology.

To fully exploit our large database of binding specificities, we have created a GBrowse-based search tool that allows an end-user to examine the overrepresentation of binding sites for any number of individual factors as well as combinations of these factors in up to six Drosophila genomes ( We have used this tool to demonstrate that a collection of factor specificities within a common pathway will successfully predict previously validated cis-regulatory modules within a genome. Furthermore, within our database we provide a complete catalog of DNA-binding specificities for all 84 homeodomains in Drosophila. This catalog enabled us to propose and test a detailed set of recognition rules for homeodomains and use this information to predict the specificities of the majority of homeodomains in the human genome.



Rights and Permissions

Copyright is held by the author, with all rights reserved.