Program in Systems Biology; Department of Biochemistry and Molecular Pharmacology
Computational Biology | Genetic Phenomena | Genomics | Structural Biology | Systems Biology
BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study.
RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments.
CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.
Rights and Permissions
© The Author(s). 2019 Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
DOI of Published Version
Genome Biol. 2019 Mar 19;20(1):57. doi: 10.1186/s13059-019-1658-7. Link to article on publisher's site
Yardimci GG, Ozadam H, Lajoie BR, Zhan Y, Dekker J, Noble WS. (2019). Measuring the reproducibility and quality of Hi-C data. Open Access Articles. https://doi.org/10.1186/s13059-019-1658-7. Retrieved from https://escholarship.umassmed.edu/oapubs/3791
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.