A Common Class of Transcripts with 5'-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification [preprint]
Authors
Cenik, CanChua, Hon Nian
Singh, Guramrit
Akef, Abdalla
Snyder, Michael P.
Palazzo, Alexander F.
Moore, Melissa J.
Roth, Frederick P.
UMass Chan Affiliations
RNA Therapeutics InstituteDepartment of Biochemistry and Molecular Pharmacology
Document Type
PreprintPublication Date
2016-09-06Keywords
bioinformaticstranscripts
introns
5' untranslated regions
N1-methyladenosines
Amino Acids, Peptides, and Proteins
Bioinformatics
Genetic Phenomena
Nucleic Acids, Nucleotides, and Nucleosides
Metadata
Show full item recordAbstract
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the Exon Junction Complex (EJC) at non-canonical 5' proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ~20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for non-canonical binding by the Exon Junction Complex.Source
bioRxiv 057455; doi: https://doi.org/10.1101/057455. Link to preprint on bioRxiv service.
DOI
10.1101/057455Permanent Link to this Item
http://hdl.handle.net/20.500.14038/29335Related Resources
Now published in RNA doi: 10.1261/rna.059105.116.
Rights
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.Distribution License
http://creativecommons.org/licenses/by-nd/4.0/ae974a485f413a2113503eed53cd6c53
10.1101/057455
Scopus Count
Collections
Except where otherwise noted, this item's license is described as The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.