University of Massachusetts Medical School Faculty Publications

UMMS Affiliation

Department of Psychiatry

Publication Date

2021-05-27

Document Type

Article Preprint

Disciplines

Genomics

Abstract

In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. The new T2T-CHM13 reference includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding. The newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies for the first time.

Keywords

Genomics, human genome, heterochromatin, proteins

Rights and Permissions

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

DOI of Published Version

10.1101/2021.05.26.445798

Source

bioRxiv 2021.05.26.445798; doi: https://doi.org/10.1101/2021.05.26.445798. Link to preprint on bioRxiv.

Comments

This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.

Full author list omitted for brevity. For the full list of authors, see article.

Journal/Book/Conference Title

bioRxiv

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Included in

Genomics Commons

Share

COinS