DNA damage pathway activation following formaldehyde fixation
When mammalian cells are exposed to genotoxic stress, the DNA damage pathway is activated. This results in poly ADP-ribosylation of proteins at DNA damage sites, and occupancy of phospho-H2A.X (pH2A.X) at the damaged site. Since formaldehyde is a potent genotoxic stress agent used for fixation of cells prior to accessible chromatin analysis, we investigated if DNA damage pathways are activated during formaldehyde cell fixation process. Anti-pH2A.X and anti-PolyADP-ribose antibodies were used in immunocytochemistry to visualize and quantitate the presence of these two marks in the nucleus. We used a series of formaldehyde concentrations ranging between 0.2 and 4% to fix HCT116 cells for 5, 10 and 20 min, and measured the accumulation of both pH2A.X foci and poly ADP-ribosylation. Indeed, the highest accumulation of both pH2A.X and poly ADP-ribosylation was observed at 0.2% formaldehyde fixation irrespective of fixing time for HCT116 cells (Fig. 1A). However, pH2A.X and poly ADP-ribosylation signals dropped as the concentration of formaldehyde was raised to 1% and remained stable at 4% fixation conditions. At 4% formaldehyde, 5 min fixation time didn’t generate additional pH2A.X and poly ADP-ribosylation signals and was comparable to longer 10- or 20-min fixation.
Formaldehyde fixation condition and accessible chromatin labeling efficiency
Since universal NicE-seq uses a nicking enzyme Nt.CviPII to label accessible chromatin DNA on cells fixed with formaldehyde, we pursue our studies of NicE-seq labeling efficiency in various formaldehyde fixing conditions. We investigated if newly generated polyADP-ribosylation and pH2A.X on chromatin during fixation may affect accessible chromatin labeling. For this work HCT116 cells fixed with 0.2, 1, and 4% formaldehyde were subjected to universal NicE-seq labeling reaction using fluorescein conjugated dATP in the nucleotide mixture for 0–2 h. We monitored the labeling efficiency by fluorescein-incorporation measurement of the nucleus. As expected, at time 0, there were no fluorescein labelled nuclei, and the labeling increased up to two hrs. However, to our surprise, we observed intense fluorescein labeling at 0.2% formaldehyde fixation despite high poly-ADP ribosylation and pH2A.X accumulation of the nuclei (Fig. 1B). We also observed fixed cells displayed more clumping at 0.2% formaldehyde compared to 1 or 4% (data not shown). Although, the nuclear staining using DAPI displayed lower pixel intensity in 0.2% compared to 1 or 4% formaldehyde fixation condition. We hypothesized that 0.2% formaldehyde fixation may result in poor fixing of the cellular components resulting in random nicks and greater polymerase mediated fluorescein-conjugated dATP incorporation or altered nuclear structure resulting in strong fluorophore incorporation compared to DAPI.
Enzymatic method of accessible chromatin library at 4% formaldehyde fixed cells
Robust 4% formaldehyde fixing condition doesn’t allow poly-ADP ribosylation or pH2A.X accumulation in cells and is often used in clinical laboratories for tissue fixation. Therefore, we developed a robust accessible chromatin protocol using HCT116 as a model cell line. We modified the universal NicE-seq protocol that routinely used 1% formaldehyde fixation and applied it to 4% formaldehyde fixation and tested its efficiency in different cell numbers. Accessible chromatin analysis in low cell numbers, particularly < 1000 is challenging for isolation of the genomic DNA. We, therefore, modified the protocol to negate DNA isolation and sonication steps before NGS library preparation. In the universal NicE-seq protocol, the labeling reaction allows the incorporation of 5mdCTP into the accessible regions. The presence of 5mdCTP in these regions confer resistant to repeated nicking and degradation of it by Nt.CviPII. We hypothesized that a second incubation of Nt.CviPII would allow nicking of chromatin DNA other than the 5mC incorporated accessible regions post-decrosslinking and proteinase K treatment. However, the decrosslinking reaction contained 0.8% SDS that would render the nicking enzyme catalytically inactive. Therefore, we tittered the SDS concentration that would allow the nicking enzyme to remain catalytically active. In this experiment, we incubated pUC19 DNA with various concentrations of SDS and added Nt.CviPII to observe its activity by analyzing the digested products on a agarose gel. Indeed, SDS was a strong inhibitor of Nt.CviPII, till 0.008% concentration (Fig. 2A). Since SDS is crucial for protein denaturation that aids in proteinase K activity, it can’t be completely removed from the reaction. This led us to investigate if we can use a quencher for SDS, such as NP40, Triton X-100, or sodium deoxycholate that would allow the Nt.CviPII activity in the 2nd reaction. Indeed, SDS inhibition of Nt.CviPII was effectively quenched by the addition of Triton X-100 (Fig. 2B). At 0.015% SDS in the reaction, a tenfold Triton X-100 mix was found to be ideal for Nt.CviPII activity.
The deproteination of the crosslinked DNA by proteases requires higher SDS concentration; therefore, we used 10 folds excess SDS (0.15%) in the presence of thermolabile proteinase K (TLPK). After the deproteination reaction, TLPK was heat-inactivated at 55 °C for 10 min and the reaction was diluted 10 folds and adjusted with Triton X-100 before 2nd Nt.CviPII digestion. We term this new method as one-pot UniNicE-seq (Fig. 2C). To test this new sonication-free one-pot UniNicE-seq enzymatic method, we used 500 HCT116 cells fixed with 4% formaldehyde and made accessible chromatin library on beads in duplicates and compared with previously published HCT116 accessible chromatin library using purified DNA from 25,000 labeled cells. Indeed, both libraries had comparable FRiP scores (Fig. 2D). The Pearson’s correlation between these libraries were r = 0.8 with similar genomic features, TSS, enhancer, and IGV profile (Fig. 2E–I). Taken together, we concluded that enzymatic accessible libraries are comparable between both methods. It would also allow library making without DNA purification and sonication, thus amenable to possible automation in the future.
Formaldehyde fixing conditions affects chromatin accessibility
To carefully determine the effect of cell fixing in different formaldehyde conditions, we chose 0.2, 1 and 4% formaldehyde as fixative prior to NicE-seq labeling and made accessible chromatin libraries in duplicate using HCT116 cells and sequenced the library in depth. First, we measured the fraction of reads in peak (FRiP) to determine the quality of libraries in downsized sample representing similar read numbers. The FRiP score for 1 and 4% formaldehyde fixed libraries were ~ 0.19 compared to ~ 0.08 for 0.2% fixed library (Additional file 1: Fig. S1A). Although the Pearson’s correlation analysis was comparable between all three conditions ( r = 0.73–1.0), the TSS profile that represents the bulk of the accessible region and a good qualitative matrix, was weakly enriched in 0.2% formaldehyde fixed cells compared to 1 or 4% fixed cells (Additional file 1: Fig. S1B, C). On a closer inspection of the IGV track, we also observed that the signal-to-noise ratios for 1 and 4% formaldehyde fixed cells were higher compared to 0.2%, confirming poor fixing conditions results in non-specific nicks throughout, perhaps contributing to the noise (Additional file 1: Fig. S1D). This result correlates with our previous observation of higher staining intensity in 0.2% fixation library due to non-specific labeling (Fig. 1B). It also confirms that formaldehyde fixing condition at or above 1% would be ideal for genome-wide accessible chromatin analysis, since there was no significant difference in TSS heat map or IGV signal between 1 and 4% fixed cells (Additional file 1: Fig. S1C, D).
Enzymatic method of accessible chromatin library with varied cell numbers
Since the majority of clinical samples at fixed with 4% formaldehyde and this process also inhibited poly ADP-ribosylation during the fixing process, we performed one-pot UniNicE-seq using the enzymatic method for accessible chromatin. We used four different cell lines, HCT116, HeLa, HEK293 and GM12878 to establish general applicability, and to investigate limitation of our new method. We used cell numbers varied from 5000, 1000, 500, 100 and 25 and made accessible chromatin library in replicates and performed analysis. The replicates for HCT116 cells were compared for FRiP, Pearson correlation, and reproducibility of TSS ± 2 Kb profile (Additional file 1: Fig. S2). The FRiP scores for replicates were reproducible (5 K cells, r = 0.187 and 0.188, 1 K cells, r = 0.158 and 0.16, 500 cells, r = 0.186 and 0.19, 100 cells, r = 0.136 and 0.142, 25 cells, r = 0.075 and 0.091., Additional file 1: Fig. S2A). Similarly, the Pearson correlation between replicates of 25–5 K cells was consistent and above r = 0.94 (Additional file 1: Fig. S2B–F). The TSS heat maps between replicates were almost identical (Additional file 1: Fig. S2G). Taken together, we demonstrated that one-pot UniNicE-seq is technically reproducible. We next merged the replicates for each cell line and performed analysis. The Pearson’s correlations between libraries for HCT116 ( r = 0.8–0.97; Fig. 3A), HeLa ( r = 0.94–0.98; Additional file 1: Fig. S3A), HEK293 ( r = 0.95–0.99; Additional file 1: Fig. S4A) and GM12878 ( r = 0.76–0.95; Additional file 1: Fig. S5A) suggesting good correlation between libraries despite variable cell numbers. The FRiP scores varied between 0.09 and 0.15 for HCT116, indicating reliability in peak identification in NGS analysis (Fig. 3B). The upset plot of overlapping accessible chromatin peaks between different cell numbers also demonstrated the majority of the peaks (> 50%) are common amongst them for HCT116 cells (Fig. 3C). Indeed, the common peaks between 25 and 5 K cells were 50% for HCT116, 70% for HeLa, 75% for HEK293 and 65% for GM12878 cells (Fig. 3C, Additional file 1: Figs. S3B, S4B, S5B). As expected, the accessible regions of the chromatin were enriched at the TSS and enhancer (Fig. 3D; Additional file 1: Figs. S3C, S4C, S5C). The distribution of genomic features between 25 and 5000 cells remained consistent (Fig. 3E; Additional file 1: Figs. S3D, S4D, S5D). Upon inspection of IGV tracks, it was apparent that the signal-to-noise ratios of accessible chromatin peaks between different numbers of cells was similar for all the four cell lines used in our validation (Fig. 3F, Additional file 1: Figs. S3E, S4E, and S5E). However, as expected, the reduction of cell numbers resulted in a loss of accessible chromatin peak. To determine the genomic distribution of the non-overlapping peaks in HCT116 data sets comprising of 5 K, 1 K, 500, 100 and 25 cells, we performed peak annotation (Additional file 1: Fig. S6). Indeed, loss of accessible peaks was evenly distributed across genomic features, suggesting cell number increase contributed to additional accessible chromatin. These results demonstrate robust accessible chromatin profiling at low cell numbers and the method could be adapted universally.
Genomic and epigenomic accessible chromatin features are maintained in 25 cells
To further validate the distributions of accessible chromatin peaks in low cell numbers, we compared one-pot UniNicE-seq in 25 cells of HeLa, HCT116, HEK293 and GM12878 cell lines. The FRiP scores of the cell lines were between 0.07 and 0.12 (Fig. 4A). The Pearson’s correlation between reads from various cells ranged from 0.67 to 0.78, indicating similarity and differences between tissue-specific origin of cells (Fig. 4B). These similarity and differences were also observed in the called accessible peaks, particularly cell-type-specific peaks were more than the common peaks (Fig. 4C). IGV traces of different cell lines clearly demonstrated the unique and common accessible regions between different cell types (Fig. 4D). Furthermore, accessible chromatin peaks were reproducible using one-pot UniNicE-seq, DNase-seq, ATAC-seq and omni ATAC-seq in HCT116, HeLa, HEK293, and GM12878 demonstrating its applicability and showing cell line-specific accessible peaks (Additional file 1: Fig. S7). The accessible chromatin peaks were mostly concentrated at gene promoters, specifically transcription start sites, as expected (Fig. 4E, Additional file 1: Figs. S3C, S4C, S5C). The genic region of both introns and coding exons also displayed varying degrees of accessible chromatin independent of cell lines. Comparison of all accessible peaks representing genic features between cell lines for 25 cells essentially displayed a similar percentage of representation (Fig. 4F). In addition, a comparison between accessible chromatin heat map surrounding TSS/TTS in one-pot UniNicE-seq and RNA expression profile demonstrated the accessibility enrichment decreases concomitantly with the expression of transcripts (Additional file 1: Fig. S8).
Next, we also compared accessible chromatin peaks obtained by 25 cells with various active and inactive chromatin marks, and binder proteins. We extracted the distribution of tag densities for various ChIP-seq experiments (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3, and CTCF) in a ± 2-kb window around the identified accessible chromatin and generated heat maps. As expected, the transcriptionally active chromatin marks, H3K4me1, H3K4me3, H3K27ac and CTCF positively enriched at the accessible chromatin region and transcriptionally inactive chromatin marks H3K9me3 inversely correlated (Fig. 4G). Similarly, H3K36me3 mark that is pronounced in the gene body, appeared to be less accessible. The degree of correlation between accessible peaks obtained from 25 or 5 K cells remains indistinguishable confirming low cell numbers based accessible chromatin regions preserves both genic and epigenetic features.
Comparison between formaldehyde fixed ATAC-see and low cell number one-pot universal NicE-seq
Accessible chromatin studies such as DNase-seq, FAIRE-seq, and NicE-seq use formaldehyde-fixed cells, compared to ATAC-seq that often uses unfixed cells. However, in a visualization and sequencing study, termed as ATAC-see, formaldehyde fixed GM12878 cells were used. We, therefore, performed one-pot UniNicE-seq on 25 and 500 fixed GM12878 cells and compared them with the published ATAC-see data sets. Pearson’s correlations between sequence reads for one-pot UniNicE-seq 25 or 500 cells and ATAC-see was 0.78 and 0.71, respectively, suggesting high similarity (Fig. 5A). Closer inspection of FRiP of ATAC-see data set demonstrated a 2.5–4.0 × lower compared to one-pot UniNicE-seq, suggesting Tn5 transposon-based accessible chromatin assay is relatively inefficient once the cells are fixed (Fig. 5B). Indeed, the read densities in both TSS ± 2.0 Kb and enhancer ± 2.0 Kb showed a lower enrichment for accessible chromatin in ATAC-see data sets, although 50,000 cells were used (Fig. 5C). The IGV signals for ATAC-see were lower compared to low cell number one-pot UniNicE-seq (Fig. 5D). These results suggest lower efficiency of Tn5 mediated tagmentation and accessible chromatin assay in ATAC-see compared to Nt.CviPII mediated one-pot UniNicE-seq.
Comparison between 4% formaldehyde fixed one-pot UniNicE-seq with unfixed ATAC-seq and omni-ATAC-seq
We further compared 4% one-pot UniNicE-seq data sets of 25 and 500 cells with unfixed accessible chromatin methodologies, including ATAC-seq (50 K cells) and Omni-ATAC-seq (50 K cells) to investigate qualitative advantages of each method using HCT116 cells. All called accessible region peaks were compared using upset plot. There were about 13.4 K peaks common to all methods and large numbers of peaks remain method-specific (Additional file 1: Fig. S9A). The FRiP scores of ATAC-seq and omni-ATAC-seq were higher compared to one-pot UniNicE-seq (Additional file 1: Fig. S9B). This suggests that the unfixed cells yield efficiently more reads from accessible regions. However, metagene plots of TSS ± 2 Kb region and enhancer start and end sites ± 2 Kb regions yielded better signal for one-pot UniNicE-seq in 25 or 500 cells (Additional file 1: Fig. S9C). These observations led us to perform peak annotation to decipher the origin of all accessible peaks between different methods. Indeed, one-pot UniNicE-seq, ATAC-seq, and Omni-ATAC-seq had similar percentage representation in all the genomic features except promoters, where one-pot UniNicE-seq with smaller cell numbers displayed more read density (Additional file 1: Fig. S9D). Indeed, the IGV browser visual analysis of all these methods indicates no loss of accessible regions between methods (Additional file 1: Fig. S9E).
Low cell number one-pot UniNicE-seq compared with aggregated scATAC-seq
Next, we compared our lower cell number accessible chromatin data sets with the scATAC-seq data set of GM12878 cells. For example, a typical human scATAC-seq data set contains 100–1000 cells with 0.02–0.05 genome coverage per cell. However, the number of cis-regulatory elements in the genome exceeds sequence read coverage of single cell and thus are not represented in any mapped read. Furthermore, aggregated data from 384 individual GM12878 cells yielded an accessibility pattern to the pattern produced by population-based ATAC-seq. Therefore, we compared one-pot UniNicE-seq of 500 cells with 384 cells aggregated scATAC-seq. The FRiP score of 500 and scATAC-seq were 0.09 and 0.28, respectively, indicating higher numbers of reads in peak for aggregated scATAC-seq (Fig. 6A). The Pearson’s correlations between tag densities between one-pot UniNicE-seq and scATAC-seq was 0.60 demonstrating significant correlation (Fig. 6B). The Venn diagram of all accessible peaks represented about ~ 40% percentage of peaks being common for all data sets indicating cell-specific accessible regions are more prominent (Fig. 6C). Accessible chromatin tag density enrichment was better in promoter and enhancer in aggregated scATAC-seq, correlating with higher FRiP scores (Fig. 6D). However, the genomic features between methods showed that scATAC-seq is more efficient in promoter, exon and 5′ UTR capture compared to 500 cell one-pot UniNicE-seq. Similarly, 500 cell one-pot UniNicE-seq was more efficient in capturing accessible regions in intergenic and intron regions (Fig. 6E). The IGV tracks between methods were comparable suggesting these methods are both reproducible in low cell numbers (Fig. 6F).