Program in Molecular Medicine
Amino Acids, Peptides, and Proteins | Computational Biology | Genomics | Nucleic Acids, Nucleotides, and Nucleosides
Genome-wide measurement of mRNA or protein levels provides broad data sets for biological discovery. However, subsequent computational methods are essential for uncovering the functional implications of the data as well as intuitively visualizing the findings. Current computational tools are biased toward well-described pathways, limiting their utility for novel discovery. Recently, we developed an annotation and category enrichment tool for Caenorhabditis elegans genomic data, WormCat, that provides an intuitive visualization output. Unlike GO, which excludes genes with no annotation information, WormCat 2.0 retains these genes as a special UNASSIGNED category. Here, we show that the UNASSIGNED gene category enrichment exhibits tissue-specific expression patterns and include genes with biological functions. Poorly annotated genes have previously been considered to lack homologs in closely related species. Instead, we find that around 3% of the UNASSIGNED genes have poorly characterized human orthologs. These human orthologs are themselves have little annotation information. A recently developed method that incorporates lineage relationships (abSENSE) indicates that failure of BLAST to detect homology explains the apparent lineage specificity for many UNASSIGNED genes, suggesting that a larger subset could be related to human genes. WormCat provides an annotation strategy that allows association of UNASSIGNED genes with specific phenotypes and known pathways. Our analysis indicates that the UNASSIGNED gene category contains candidates that merit further functional study which could yield insight into understudied areas of biology.
Genomics, WormCat 2.0, genes, Caenorhabditis elegans
Rights and Permissions
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
DOI of Published Version
bioRxiv 2021.11.11.467968; doi: https://doi.org/10.1101/2021.11.11.467968. Link to preprint on bioRxiv.
Higgins DP, Weisman CM, Lui D, D’Agostino FA, Walker AK. (2021). WormCat 2.0 defines characteristics and conservation of poorly annotated genes in Caenorhabditis elegans [preprint]. University of Massachusetts Medical School Faculty Publications. https://doi.org/10.1101/2021.11.11.467968. Retrieved from https://escholarship.umassmed.edu/faculty_pubs/2095
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 License.