High-throughput Genome Scaffolding from in vivo DNA Interaction Frequency
Abstract
Despite the advancement of DNA-sequencing technology, assembly of complex genomes remains a major challenge. Surprisingly, the quality of published assemblies of complex genomes has in fact decreased, due to the discrepancy between the rate of advancement of short read sequencing and that of scaffolding technology. Short read sequencing typically yields accurate, but disconnected, contigs. However, high-throughput scaffolding of contigs into chromosomes, based on long-insert paired-end read libraries, is a difficult task and yields highly fragmented genomes. Further scaffolding, which is required to improve the degree of completion of genome sequences, typically relies on laborious or low-throughput methods. We have developed a novel sequencing-based high-throughput approach to genome assembly, based on the notion that loci that are near each other in the genomic sequence have a high probability of interacting with each other. Using probabilistic models, we demonstrate that genome-wide in vivo chromatin interaction frequency measurements, easily measurable with 3C-based experiments, can be used as genomic distance proxies to accurately detect the position of individual contigs over large distances without requiring any sequence overlap. Furthermore, we demonstrate our approach can karyotype and scaffold an entire genome de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size, is simple, robust, scalable and applicable to any species.DOI
10.13028/4b87-yz32Permanent Link to this Item
http://hdl.handle.net/20.500.14038/27971Notes
Abstract of poster presented at the 2014 UMass Center for Clinical and Translational Science Research Retreat, held on May 20, 2014 at the University of Massachusetts Medical School, Worcester, Mass.
Rights
Copyright the Author(s)Distribution License
http://creativecommons.org/licenses/by-nc-sa/3.0/ae974a485f413a2113503eed53cd6c53
10.13028/4b87-yz32