UMass Chan Medical School Faculty Publications

UMMS Affiliation

Program in Molecular Medicine

Publication Date


Document Type

Article Preprint


Amino Acids, Peptides, and Proteins | Computational Biology | Genomics | Nucleic Acids, Nucleotides, and Nucleosides


Genome-wide measurement of mRNA or protein levels provides broad data sets for biological discovery. However, subsequent computational methods are essential for uncovering the functional implications of the data as well as intuitively visualizing the findings. Current computational tools are biased toward well-described pathways, limiting their utility for novel discovery. Recently, we developed an annotation and category enrichment tool for Caenorhabditis elegans genomic data, WormCat, that provides an intuitive visualization output. Unlike GO, which excludes genes with no annotation information, WormCat 2.0 retains these genes as a special UNASSIGNED category. Here, we show that the UNASSIGNED gene category enrichment exhibits tissue-specific expression patterns and include genes with biological functions. Poorly annotated genes have previously been considered to lack homologs in closely related species. Instead, we find that around 3% of the UNASSIGNED genes have poorly characterized human orthologs. These human orthologs are themselves have little annotation information. A recently developed method that incorporates lineage relationships (abSENSE) indicates that failure of BLAST to detect homology explains the apparent lineage specificity for many UNASSIGNED genes, suggesting that a larger subset could be related to human genes. WormCat provides an annotation strategy that allows association of UNASSIGNED genes with specific phenotypes and known pathways. Our analysis indicates that the UNASSIGNED gene category contains candidates that merit further functional study which could yield insight into understudied areas of biology.


Genomics, WormCat 2.0, genes, Caenorhabditis elegans

Rights and Permissions

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.

DOI of Published Version



bioRxiv 2021.11.11.467968; doi: Link to preprint on bioRxiv.


This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.

Journal/Book/Conference Title


Creative Commons License

Creative Commons Attribution-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 License.