University of Massachusetts Medical School Faculty Publications

UMMS Affiliation

Program in Bioinformatics and Integrative Biology; RNA Therapeutics Institute; Program in Molecular Medicine; Graduate School of Biomedical Sciences

Publication Date


Document Type



Developmental Biology | Genetics and Genomics | Nucleic Acids, Nucleotides, and Nucleosides


In the male germ cells of placental mammals, 26-30-nt-long PIWI-interacting RNAs (piRNAs) emerge when spermatocytes enter the pachytene phase of meiosis. In mice, pachytene piRNAs derive from ~100 discrete autosomal loci that produce canonical RNA polymerase II transcripts. These piRNA clusters bear 5' caps and 3' poly(A) tails, and often contain introns that are removed before nuclear export and processing into piRNAs. What marks pachytene piRNA clusters to produce piRNAs, and what confines their expression to the germline? We report that an unusually long first exon ( > /= 10 kb) or a long, unspliced transcript correlates with germline-specific transcription and piRNA production. Our integrative analysis of transcriptome, piRNA, and epigenome datasets across multiple species reveals that a long first exon is an evolutionarily conserved feature of pachytene piRNA clusters. Furthermore, a highly methylated promoter, often containing a low or intermediate level of CG dinucleotides, correlates with germline expression and somatic silencing of pachytene piRNA clusters. Pachytene piRNA precursor transcripts bind THOC1 and THOC2, THO complex subunits known to promote transcriptional elongation and mRNA nuclear export. Together, these features may explain why the major sources of pachytene piRNA clusters specifically generate these unique small RNAs in the male germline of placental mammals.


Data integration, Epigenetics, Gene regulation, Spermatogenesis

Rights and Permissions

Copyright © The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

DOI of Published Version



Yu T, Fan K, Özata DM, Zhang G, Fu Y, Theurkauf WE, Zamore PD, Weng Z. Long first exons and epigenetic marks distinguish conserved pachytene piRNA clusters from other mammalian genes. Nat Commun. 2021 Jan 4;12(1):73. doi: 10.1038/s41467-020-20345-3. PMID: 33397987; PMCID: PMC7782496. Link to article on publisher's site

Related Resources

Link to Article in PubMed

Journal/Book/Conference Title

Nature communications

PubMed ID


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.