Tissue samples and genomic DNA extraction
The cerebellum, cortex and olfactory bulb were dissected from an 8-week-old female C57Bl/6 mouse. The cortex tissue came from the whole cerebral cortex excluding the hippocampus regions. The cerebellum and olfactory bulb tissues came from the whole tissues, respectively. The tissues for the technical replicates were from independent mice, and at least two replicates were included in the different assays as indicated. Genomic DNA was isolated using a DNA extraction kit (Qiagen, Cat#: 51306) according to the manufacturers’ instructions.
LC–ESI–MS analysis
The LC–ESI–MS analysis was performed according to our previous study [20]. Briefly, the percentages of 5mC and 5hmC were calculated by the following formula, where M (cytosine), M (5mC) and M (5hmC) are the molar quantities of cytosine.
$$5{\text{mC \% }} = \frac{{{\text{M}}\left( {5{\text{mC}}} \right)}}{{{\text{M}}\left( {\text{cytosine}} \right) + {\text{M}}\left( {5{\text{mC}}} \right) + {\text{M}}\left( {5{\text{hmC}}} \right)}} \times 100$$
$$5{\text{hmC \% }} = \frac{{{\text{M}}\left( {5{\text{hmC}}} \right)}}{{{\text{M}}\left( {\text{cytosine}} \right) + {\text{M}}\left( {5{\text{mC}}} \right) + {\text{M}}\left( {5{\text{hmC}}} \right)}} \times 100$$
oxBS-seq
The sequencing libraries were constructed as described [9] with minor modifications. Briefly, genomic DNA (0.5–1 μg) was end-repaired, A-tailed and ligated to methylated adaptors following the manufacturer’s instructions. The ligated fragments were then purified using the Bio-Rad Micro Bio-Spin P-6 SSC column (SSC buffer). After the purification, the samples were denatured using 1 M NaOH and oxidized using an oxidant solution (15 mM KRuO4 in 0.05 M NaOH). After the purification, the DNA underwent bisulfite conversion using the EZ DNA Methylation Gold kit (Zymo Research) according to the instruction manual. The library was sequenced using Illumina HiSeq X Ten. The paired reads were uniquely mapped to the reference genome (mm10, UCSC) by Bismark. The efficient conversion of 5hmC to uracil was calculated by a spiked 5hmC control from Zymo Research (Cat#: D5405).
TAB-seq
The sequencing libraries were constructed as described [6] with minor modifications. Briefly, genomic DNA, with spike-in controls, was glycosylated and oxidized using the kit from Wisegene (Cat#: K001). Then, the DNA underwent bisulfite conversion using the EZ DNA Methylation Gold kit (Zymo Research) according to the instruction manual. The library was sequenced using Illumina HiSeq X Ten. The paired reads were mapped uniquely to the reference genome (mm10, UCSC) by Bismark. The efficient conversion of unmodified cytosine to uracil and the efficient conversion of 5mC to 5caU/U were calculated using spiked M.SssI-treated lambda DNA.
The associations between the biological replicates for TAB-seq and oxBS-seq
The 5hmC and 5mC levels of the biological replicates were calculated in 1 Mb regions throughout the genome. Then, a hierarchical cluster analysis was performed using hcluster in R with ward.D methods.
Quantification of the 5hmC and 5mC levels of each CpG site
The 5hmC level or the 5mC level of each CpG site was calculated using the same methods as described [20]. Briefly, a binomial distribution model was performed to calculate the significance for the hydroxymethylation or methylation. Only sites with Benjamini–Hochberg-corrected binomial P value ≤ 0.05 and reads coverage ≥ 5 were considered hydroxymethylated or methylated. We counted the number of “C” bases from the sequencing reads as hydroxymethylated or methylated (denoted as NC) and the number of “T” bases as unmodified (denoted as NT). The hydroxymethylation level or methylation level was estimated as NC/(NC + NT).
The enrichment scores of the 5hmC- and 5mC-modified sites in the different genomic regions
The enrichment score was calculated by the following formula: the enrichment scorein the genomic element = log2 (# called modified sitesin the genomic element/# expected). # expected was computed as: # called modified sitesin the genome × # CpG sitesin the genomic element/# total CpG sitesin the genome. # denotes the number of sites.
Identification of the genes that enriched the co-modified CpG sites
First, the CpG sites that were simultaneously identified as 5hmC- and 5mC-modified sites were determined as the co-modified CpG sites. Then, the enrichment score of the sites in each gene was calculated by the following formula:
The enrichment scorein the gene = log2 (#the co-modified CpG sites)in the gene/# expected). # expected was computed as: # the co-modified CpG sitesin the genome × # CpG sitesin the gene/# total CpG sitesin the genome. # denotes the number of sites.
Identification of the methylation haplotype blocks (MHBs)
The MHBs were identified as described [7] with minor modifications. Briefly, the mouse genome was split into non-overlapping ‘sequenceable and mappable’ segments using the oxBS-seq data from the mouse cerebellum, cortex and olfactory bulb tissues. The “sequenceable and mappable” segments are defined as the high-quality mapping regions which were assembled from the pair-end reads. The mapped reads from the oxBS-seq datasets were converted into methylation haplotypes within each segment. The methylation linkage disequilibrium was calculated on the combined methylation haplotypes. We then partitioned each segment into MHBs. Candidate MHBs were defined as the genomic region in which the LD r2 value of two adjacent CpG sites was no less than 0.5. Then, the LD r2 matrix of the segments was calculated. The candidate MHBs were extended if the LD r2 value of the CpG sites in the segments was no less than 0.5.
Identification of the hydroxymethylation haplotype blocks (hMHBs)
We identified hMHBs using the similar theoretical framework as MHBs with minor modifications. Briefly, for TAB-seq data, the readout of both methylated and unmodified sites will be “T,” while the readout of hydroxymethylated sites will be “C.” Thus, we focused on the gene body regions which enriched 5hmC modifications to identify the co-hydroxymethylation blocks (hMHBs). In this way, we can avoid the false positively called as co-hydroxymethylation blocks for the highly methylated regions (such as intergenic regions) and unmethylated regions (such as promoter regions and CpG islands) with TAB-seq data. Additionally, the average 5hmC level of each called CpG sites is lower than the average 5mC level; thus, we tried different cutoffs for the LD r2 score of linkage disequilibrium two adjacent CpG sites hydroxymethylation.
The coordination between the hMHBs and MHBs in gene body regions
The coordination analysis was performed by random sampling method. First, the same amount of regions with same length distribution as MHBs within gene body regions were randomly sampled and repeated 10,000 times. Then, the overlap between randomly sampled regions and the hMHBs was calculated as the expected number of overlapped blocks between hMHBs and MHBs within gene body regions. Statistical significance was calculated by hypergeometric test.
Enrichment analysis of the methylation haplotype blocks for known genomic regulatory elements
The enrichment analysis was performed by random sampling as previously described [21]. Genomic regions with the same number of MHBs and CpGs were randomly sampled within the mappable CpG regions and repeated 10,000 times. Then, we obtained the expected MHBs in the genomic element. All of the genomic coordinates were based on the mm10 mouse genomic sequence.
The enrichment score was calculated by the following formula:
The enrichment scorein the genomic element = log2 (# called MHBsin the genomic element/# expected).
Methylation haplotype load (MHL)
The MHLs were identified as described [7], which is the normalized fraction of methylated haplotypes at different lengths.
Identification of tissue-specific MHBs
The tissue-specific MHBs were identified as described [7] with minor modifications. To investigate the tissue-specific MHBs, the tissue-specific index (TSI) was defined. An empirical threshold TSI > 0.6 was used to define the tissue-specific MHBs.
$${\text{TSI}} = \frac{{\mathop \sum \nolimits_{j = 1}^{n} 1 - \frac{{10^{{{\text{MHL(}}j )}} }}{{10^{{{\text{MHL}}\left( {\hbox{max} } \right)}} }}}}{n - 1}$$
where n indicates the number of the tissues, MHL(j) denotes the MHL of jth tissue and MHL max denotes the MHL of the highest methylated tissue.
Enrichment analysis of the MHBs for the co-modified CpG sites
First, the co-modified CpG sites in each MHB were calculated. Then, the enrichment score of the co-modified CpG sites in each MHB was calculated by the following formula:
The enrichment scorein the MHB = log2 (#the co-modified CpG sites)in the MHB/# expected). # expected was computed as: # the co-modified CpG sitesin the genome × # CpG sitesin the MHB/# total CpG sitesin the genome. # denotes the number of sites.
Motif analysis
We searched for the enrichment of known motifs using the Homer tool [22]. To search for the motifs within a single tissue, we used default parameters with a fragment size for motif searching of 200 bp.
External datasets
According to the H3K4me1 and H3K27ac marks, two types of enhancers were distinguished as follows: active enhancers that were simultaneously marked by a distal H3K4me1 and H3K27ac and poised enhancers that were solely marked by a distal H3K4me1. The H3K4me1 and H3K27ac peaks of the adult mouse cortex, olfactory bulb and cerebellum tissues were acquired from the ENCODE project.