Recognition models to predict DNA-binding specificities of homeodomain proteins
Authors
Christensen, Ryan G.Enuameh, Metewo Selase
Noyes, Marcus Blaine
Brodsky, Michael H.
Wolfe, Scot A.
Stormo, Gary D.
UMass Chan Affiliations
Program in Molecular MedicineProgram in Gene Function and Expression
Department of Biochemistry and Molecular Pharmacology
Document Type
Journal ArticlePublication Date
2012-06-15Keywords
DNA-Binding ProteinsHomeodomain Proteins
Models, Genetic
Computational Biology
Genetics and Genomics
Metadata
Show full item recordAbstract
MOTIVATION: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. RESULTS: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model. CONTACT: stormo@wustl.edu.Source
Bioinformatics. 2012 Jun 15;28(12):i84-i89. Link to article on publisher's site
DOI
10.1093/bioinformatics/bts202Permanent Link to this Item
http://hdl.handle.net/20.500.14038/43987PubMed ID
22689783Related Resources
Link to Article in PubMedRights
© The Author(s) 2012. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
ae974a485f413a2113503eed53cd6c53
10.1093/bioinformatics/bts202