Skip to content

Latest commit

 

History

History
57 lines (49 loc) · 2.11 KB

paper.md

File metadata and controls

57 lines (49 loc) · 2.11 KB
title tags authors affiliations date bibliography
iced: fast and memory efficient normalization of contact maps
Hi-C contact count
Normalization
name orcid affiliation
Nelle Varoquaux
0000-0002-8748-6546
1
name orcid affiliation
Nicolas Servant
0000-0003-1678-7410
3, 4, 5
name index
University of California, Berkeley
1
name index
Institut Curie
3
name index
INSERM
4
name index
Mines ParisTech
5
2019, February, 7th
paper.bib

Summary

Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale.

Iced implements fast and memory efficient normalization methods, such the ICE normalization strategy or the SCN algorithm. It is included in the HiC-pro pipeline, that processes data from raw fastq files to normalized contact maps. iced eventually grew bigger than just being a normalization packages, and contains a number of utilities functions that may be useful if you are analyzing and processing Hi-C data.

Moving from sequencing reads to a normalized contact map is a challenging task. Hi-C usually requires several millions to billions of paired-end sequencing reads, depending on genome size and on the desired resolution. Managing these data thus requires optimized bioinformatic workflows able to extract the contact frequencies in reasonable computational time and with reasonable resource and storage requirements. The final step of such pipeline is typically a normalization step, essential to ensure accurate analysis and proper interpretation of the results.

We propose here a fast implementation of the iterative correction method in Python [@imakaev:iterative] and SCN [@cournac:normalization]. iced emphasizes ease-of-use, performance, maintainability, and memory-efficiency. This implementation leverages a memory-efficient data format of Hi-C maps, and outperforms both in speed and memory usage HiCorrector [@li:hi-corrector], a parallelized C++ implementation of the same algorithm.

References