Ecient and Accurate Determination of Genome-Wide DNA Methylation Patterns in Arabidopsis with Enzymatic Methyl Sequencing.

Background: 5’ methylation of cytosines in DNA molecules is an important epigenetic mark in eukaryotes. Bisulte sequencing is the gold standard of DNA methylation detection, and whole-genome bisulte sequencing (WGBS) has been widely used to detect methylation at single-nucleotide resolution on a genome-wide scale. However, sodium bisulte is known to severely degrade DNA, which, in combination with biases introduced during PCR amplication, leads to unbalanced base representation in the nal sequencing libraries. Enzymatic conversion of unmethylated cytosines to uracils can achieve the same end product for sequencing as does bisulte treatment and does not affect the integrity of the DNA; enzymatic methylation sequencing may thus provide advantages over bisulte sequencing. Results: Using an enzymatic methyl-seq (EM-seq) technique to selectively deaminate unmethylated cytosines to uracils, we generated and sequenced libraries based on different amounts of Arabidopsis input DNA and different numbers of PCR cycles, and compared these data to results from traditional whole genome bisulte sequencing. We found that EM-seq libraries were more consistent between replicates and had higher mapping and lower duplication rates, lower background noise, higher average coverage, and higher coverage of total cytosines. Differential methylation region (DMR) analysis showed that WGBS tended to over-estimate methylation levels especially in CHG and CHH contexts, whereas EM-seq detected higher CG methylation levels in certain highly methylated areas. These phenomena can be mostly explained by a correlation of WGBS methylation estimation with GC content and methylated cytosine density. We used EM-seq to compare methylation between leaves and owers, and found that CHG methylation level is greatly elevated in owers, especially in pericentromeric regions. Conclusion: We suggest that EM-seq is a more accurate and reliable approach than WGBS to detect methylation. Compared to WGBS, the results of EM-seq are less affected by differences in library preparation conditions or by the skewed base composition in the converted DNA. It may therefore be more desirable to use EM-seq in methylation studies.


Introduction
The fth carbon position of cytosine in DNA can be covalently modi ed by the addition of a methyl group to form 5-methylcytosine (5-mC). DNA methylation that takes place at cytosine residues which are followed by guanine is termed CG methylation and is conserved in most eukaryotes. Non-CG methylation, where modi cation occurs in CHG and CHH contexts (where H corresponds to A, T, or C residues), occurs in plants and many other organisms. CHH methylation is also called asymmetrical methylation. DNA methylation is typically associated with the silencing of genes and repetitive DNAs such as transposable elements, however expressed genes are also found to be methylated. DNA methylation is involved in a large number of cellular processes including genomic imprinting, X chromosome inactivation, embryonic development, and transcriptional regulation of developmentally important genes, as well as in ensuring genome integrity and protecting against invasive DNAs [1][2][3][4][5].
The rst step in the study of DNA methylation is to determine whether or not a given cytosine residue is methylated. Indirect approaches to measure methylation include pull-down assays with methylationspeci c antibodies and methyl-binding proteins, and restriction digestion with enzymes with preferences for or against methylcytosines [6]. The direct approach, which can achieve single-nucleotide resolution, is sequencing. Since methylated cytosine pairs with guanine in the same way unmethylated cytosine does, traditional sequencing methods (based on base-pairing) are not able to differentiate between methylated and unmethylated cytosines. To solve this problem, sodium bisul te can be used to convert unmethylated cytosines to uracils, which are ampli ed as thymines in PCR; because methylated cytosines do not react with sodium bisul te, they remain as cytosines in the sequence. Thus, thymines detected in bisul te sequencing correspond to either thymines or unmethylated cytosines in the original DNA, and alignment with the original template sequence easily differentiates between them [7]. Since its development a little more than a decade ago, whole-genome bisul te sequencing (WGBS) has been successfully used to survey DNA methylation on a genome-wide scale [8][9][10][11]. While WGBS (which combines bisul te treatment with high-throughput sequencing) is the gold standard for measuring genome-wide methylation, it has several shortcomings. After bisul te conversion, DNA becomes C-poor, which can result in di culties for polymerase reactions, as well as with sequencing machines, basecallers, and aligners. Although recent improvements in PCR reagents, sequencing hardware/software, and bioinformatics tools have helped to alleviate these di culties, two fundamental problems remain. First, bisul te degrades the majority of the DNA during the conversion process (due to backbone scission induced by depyrimidination and perhaps depurination as well); second, bisul te preferentially damages DNA at unmethylated cytosines via depyrimidination (more effectively than at methylated cytosines) [12,13]. These properties of bisul te treatment make it challenging to perform WGBS from tissues that have limited starting material, and create a bias that can result in an over-estimation of methylation level by WGBS.
Recently, an enzymatic methyl-seq (EM-seq) technique was developed, which uses TET2 in the rst enzymatic step to oxidize methylated cytosines and APOBEC2 in the second enzymatic step to convert unmethylated cytosines to uracils [20]. During the subsequent PCR ampli cation, oxidized methylcytosines form base pairs with guanines and uracils form base pairs with adenine. Since the end products of WGBS and EM-seq are the same (methylated cytosines stay as cytosines and unmethylated cytosines appear as thymines in the sequence), the same analysis tools can be used. Because enzymatic reactions are non-destructive, EM-seq promises better yield and higher accuracy in the measurement of methylation levels [20]. In this study, we employed EM-seq to study DNA methylation in Arabidopsis thaliana and compared EM-seq results to those from WGBS. We also used EM-seq to examine the methylation differences between owers and leaves in Arabidopsis.

Results
Generating and sequencing EM-seq and WGBS libraries To systematically compare EM-seq and WGBS, we prepared Illumina sequencing libraries from different amounts of genomic DNA extracted from Arabidopsis owers -25ng, 50ng, 150ng, or 400ng (to encompass the range of starting material amounts that are typically used in an Arabidopsis methylome study). This was followed by either enzymatic (TET2 followed by APOBEC3A) or bisul te treatment, and PCR ampli cation of the converted products (for 6, 12, or 18 cycles) to complete the library preparations ( Fig. S1, see also Methods). Two replicates were performed for each condition. EM-seq libraries consistently showed higher mapping rates and lower duplication rates than WGBS libraries, and the variation between EM-seq libraries was smaller than the variation between WGBS libraries (Fig. 1ab, Table   S1a). In addition, within , higher mapping rates and lower duplication rates (manifested by higher effective read rate) were generally associated with moderate input amounts and lower PCR cycle numbers Table S1a. One of the causes of false methylation reporting in bisul te sequencing is nonconversion, in which the two strands of DNA occasionally fail to fully denature during bisul te treatment and are thus resistant to bisul te conversion [21]; this leads to the detection of several adjacent unconverted cytosines. In Arabidopsis, we previously introduced a lter that discards sequencing reads with three or more consecutive methylated cytosines in the CHH context [8]. This non-conversion lter works well in Arabidopsis, since CHHs are rarely methylated above 10% [22,23], so the chance of observing three consecutive methylated CHHs is below 0.1% and only a small number of real methylation signatures are discarded. However, it is not practical to use this lter in organisms that have high levels of CHH methylation. Very few EM-seq reads (1.56% to 2.01%) eligible for removal by the non-conversion lter were identi ed, while the ltered rates for WGBS libraries were much higher and showed greater variation (2.62% to 13.41%) (Fig. 1c, Table S1b). This suggests that the EM-seq method is generally free of the notorious non-conversion problem that is frequently observed in WGBS, and is therefore useful for methylation detection in organisms in which use of an arbitrary non-conversion lter is not suitable.
Consistent with this, we found that virtually no methylated cytosines were detected from the unmethylated Arabidopsis chloroplast genomes in any EM-seq library, a result that is only achieved or approached by the best WGBS library (400ng input with 12 cycles of PCR) (Fig. S2, Table S1c). This indicates that EM-seq has much lower background noise levels than WGBS.
As expected from these comparisons, EM-seq also displays advantages in terms of genomic coverage ( Fig. 1d). Perhaps more importantly, EM-seq is able to cover 22.07%, 22.10%, and 23.47% more CG, CHG, and CHH sites, respectively, than WGBS (Fig. 2a); these numbers are consistent with previous ndings of EM-seq in human cells [20] and the manufacturer's description of the product [24]. All EM-seq libraries exhibit similar cytosine coverage (CG: ranging from 5,556,957 to 5,602,669, CHG: ranging from 6,090,541 to 6,128,646, CHH: ranging from 31,123,001 to 31,315,262), while different preparation conditions clearly affect the performance of WGBS libraries (CG: ranging from 3,922,759 to 5,165,506, CHG: ranging from 4,26,9441 to 5,664,610, CHH: ranging from 22,080,267 to 28,678,168) (Fig. 2a).
Next, we examined the dependence of coverage on nucleotide composition. Dinucleotide pro les suggest that EM-seq has even coverage and is minimally affected by different dinucleotide combinations. In contrast, WGBS libraries show enrichment for dinucleotides containing Gs and depletion for dinucleotides contains Cs (Fig. 2b), consistent with the damaging effect of sodium bisul te on unmethylated cytosines [13]. These biases are more pronounced in libraries with 18 cycles of PCR (Fig. 2b), indicating that PCR ampli cation during WGBS library preparation favors unconverted DNA over converted DNA, due to the low melting temperature and thermostability of AT pairs [25,26]. This can further negatively affect accurate estimation of methylation level (see below). Similar biases were not observed in EM-seq libraries, although the product of TET2/APOBEC3A conversion is the same as that of bisul te conversion, which suggests that the polymerase used to amplify EM-seq libraries is superior to the one used for WGBS, in terms of avoiding the introduction of base biases. As previously shown for a library preparation kit similar to the one used in our study [27], WGBS libraries show enrichment for dinucleotides containing only A or T (Fig. 2b). We then decided to look directly at the dependence of coverage on GC content.
Again, EM-seq libraries show more even pro les over the majority of the GC content range than do WGBS libraries (Fig. 2c). WGBS libraries have clearly higher coverage in AT-rich regions than in GC-rich regions ( Fig. 2c), a known issue for bisul te sequencing [20,27] that is discussed above (Fig. 2b).
Overall, the quality metrics of our EM-seq libraries encouraged us to further explore whether it measures methylation more accurately than WGBS in Arabidopsis.

Arabidopsis DNA methylation levels measured by EM-seq and WGBS
We compared levels of CG, CHG, and CHH methylation measured by EM-seq and by WGBS and noted that EM-seq measured DNA methylation levels are lower ( Fig. 3a-c, Table S2), even if background noise is considered (see Fig. S2). This is consistent with the previous results obtained by the manufacturer for Arabidopsis [24]. Total DNA methylation levels estimated from EM-seq data (Fig. 3d) are close to the levels previously detected by liquid chromatography-mass spectrometry (LCMS) in Arabidopsis [24]. We observed that increasing the number of PCR cycles led to higher methylation levels in the respective WGBS libraries (especially for libraries with 18 cycles of PCR; Fig. 3a-d, Table S2) and reasoned that this likely re ects the above-mentioned preference of PCR ampli cation during WGBS library preparation for unconverted DNA (see Fig. 2b). To further test this hypothesis, we analyzed the correlation of methylation level with density of methylated cytosines in both EM-seq and WGBS. For this analysis we picked the best WGBS library (400ng input with 12 cycles of PCR) (see Fig. S2, Table S1c) and its EM-seq counterpart. As Fig. 3ef reveals, for CHG and CHH methylations, the differences in methylation levels between EM-seq and WGBS increases with cytosine methylation density, which is the expected result based on our hypothesis. Much less difference between EM-seq and WGBS was observed for CG methylation, possibly because CG methylation is more or less bimodal (either completely unmethylated or completely methylated) [8], and thus the PCR bias in WGBS library preparation toward methylated CG sites would have less in uence on CG methylation percentages. Nonetheless, CG methylation levels still appear to be moderately over-estimated by WGBS due to the selective damage of DNA containing unmethylated cytosines by sodium bisul te [13] (see also Fig. 2bc).
We next plotted both EM-seq and WGBS data across all ve Arabidopsis chromosomes (Fig. 4a). In general, methylation levels reported by WGBS are higher than those from EM-seq, especially in the case of CHH methylation, where some WGBS libraries made under suboptimal conditions (e.g. higher number of PCR cycles) suffer from severe over-estimation of methylation in the euchromatic arms of the chromosomes. Interestingly, Fig. 4a also reveals that methylation levels measured by EM-seq are higher in pericentromeres than those measured by WGBS. In the next section, we explore this further using differentially methylated region (DMR) analysis.
Arabidopsis has been shown to have two distinctive DNA methylation patterns: CG methylation in the body of protein-coding genes and all three types of methylation (CG, CHG, and CHH) in repetitive DNAs such as transposable elements (TEs) [8,9,[28][29][30]. In the past, metaplot analysis by WGBS has always reported some residual non-CG methylation inside gene bodies that were indistinguishable from noise [8,9] due to the background non-conversion issues associated with bisul te conversion (Fig. 1c, Fig. S2).
Since we now know that EM-seq has much lower background than WGBS, we ran the same metaplot analysis with EM-seq data and observed much reduced CHG and CHH methylation levels across gene bodies (Fig. 4b). The observed levels were still an order of magnitude higher than the pure background noise that can be inferred from chloroplast methylation, especially in the case of CHG (1.16% CHG methylation on average over gene body compared to 0.22% background) (Fig. S2, Table S1c), consistent with the known presence of low levels of non-CG methylation in gene bodies [31]. In terms of methylation in TEs, EM-seq produces similar metaplot pro les as WGBS, albeit with lower levels (especially for CHG and CHH (Fig. 4c), which is expected since WGBS tends to overestimate CHG and CHH methylations (Fig.  3, Table S2)).
We also compared the methylation patterns and levels in the chromosomal plots and metaplots containing only the best WGBS library (400ng input with 12 cycles of PCR) and its corresponding EM-seq library (Fig. S3). As expected, the methylation differences between this pair of libraries were smaller than the differences when other WGBS libraries are included in the comparison. Nevertheless, the basic patterns are the same as described above (see Fig. 4).
Since CHG and CHH methylations are maintained by RNA-directed DNA methylation (RdDM) [29], we looked at methylation in genomic regions bound by Polymerase V (PolV), which are often used as a proxy for RdDM target loci [32,33]. CHG and CHH methylations over the PolV ChIP-seq peaks were elevated to various extents in different WGBS libraries, while all the EM-seq libraries show similar levels (Fig. 4d).
As an example of a gene with a well-studied methylation pattern, we looked at methylation in the FLOWERINGWAGENINGEN (FWA) locus, a target of RdDM [34][35][36]. While methylated cytosines in non-CG contexts were detected at low levels throughout the FWA gene in almost all of the WGBS datasets, EM-seq data clearly shows that non-CG methylation only exists in the promoter/beginning of coding sequence (CDS) region of FWA (Fig. S4a), where the known patch of RdDM is known to occur. Even when using only data from the best WGBS library (400ng input with 12 cycles of PCR), we still see trace amounts of non-CG methylation downstream of the promoter/beginning of CDS region of FWA; the same places show no non-CG methylation in EM-seq data (Fig. S4b).
Differentially methylated region analyses in EM-seq and WGBS We performed pairwise differential methylation region (DMR) analysis both within the various datasets of EM-seq or WGBS and across EM-seq and WGBS datasets. Orders of magnitude fewer DMRs are called within the EM-seq libraries than within WGBS libraries (Fig. S5ab). A larger number of DMRs arose in comparisons between the EM-seq libraries made from the least input DNA amount and the most input DNA amount and between the EM-seq libraries made with the lowest number of PCR cycles and the highest number of PCR cycles (Fig. S5a). The same trends were observed in comparisons between WGBS libraries, although the WGBS comparisons produce much larger numbers, especially in the case of CHG and CHH methylation (Fig. S5b).
When comparing called DMRs between EM-seq libraries and WGBS libraries made from the same amount of input DNA and with the same number of PCR cycles, we noticed that there are many more WGBS hyper-DMRs (higher methylation in WGBS libraries) than EM-seq hyper-DMRs (Fig. 5a-c). WGBS libraries with 18 cycles of PCR were clearly the outliers, since they tended to have more WGBS hyper-DMRs than other conditions, and the situation is made worse by combining 18 cycles of PCR with 400ng of input DNA (Fig. 5a-c). Therefore, when making WGBS libraries, excess PCR ampli cation should be avoided, especially if starting with plenty of DNA. There are 7.94 (4123/519) times as many WGBS hyper-CG DMRs as EM-seq hyper-CG DMRs, while the numbers for CHG and CHH are 405.99 (110834/273) and 802.81 (660713/823) times respectively, suggesting that WGBS has more enrichment of CHG and CHH hyper-DMRs than CG hyper-DMRs (Fig. 5d, Table S3). This ts with our previous nding that CHG and CHH methylation are more over-estimated by WGBS than is CG methylation (Fig. 3, 4). For EM-seq hyper-DMRs, we saw some variation in DMR numbers among different library preparation conditions, for example more hyper-DMRs were seen between libraries with 12 cycles of PCR (Fig. 5a-c), however we suspect that this is rather due to the variation in WGBS libraries than the difference between WGBS and EM-seq libraries (since WGBS libraries have much higher variation among themselves than do EM-seq libraries (Fig. S5ab)).
We next plotted the methylation levels in the de ned EM-seq and WGBS hyper-DMRs (Fig. 5e). Interestingly, EM-seq hyper-CG and CHH DMRs on average have higher methylation levels than WGBS hyper-CG and CHH DMRs, respectively. The methylation level in EM-seq hyper-CHG DMRs is lower than that in WGBS hyper-CHG DMRs, which is the opposite of what is observed in EM-seq hyper-CG and CHH DMRs (Fig. 5e). There are very few EM-seq hyper-CHG DMRs, and many of them are obtained from comparison of EM-seq and WGBS in two conditions (400ng input, 6 and 12 cycles) (Fig. 5b), which could skew the result. Furthermore, most of the WGBS hyper-CHG DMRs are in pericentromeric heterochromatin regions (Fig. S5c) with high levels of methylation (see Fig. 4a). GC content analyses in DMRs indicate that EM-seq hyper-DMRs have signi cantly lower GC content than WGBS hyper-DMRs (Fig. 5f). One extreme case is the mitochondria chromosome (chrM), which has a ~10% higher GC content than the ve nuclear chromosomes (Fig. S6a). The relative difference in methylation levels between EM-seq and WGBS in chrM is larger than that in other chromosomes (Fig. S6b), and in fact the majority of the chrM is called as WBGS hyper-DMRs (Table S4). The methylation levels in chrM are generally very low as determined by EM-seq (Fig. S6b), which ts with the previous observation that WGBS hyper-DMRs tend to have lower methylation levels (Fig. 5e). We reasoned that in WGBS hyper-DMRs in nuclear chromosomes there are likely many sites that should have no or very low methylation (like chrM); this would make the average methylation levels in WGBS hyper-DMRs low, except for WGBS hyper-CHG DMRs for abovementioned reasons (see Fig. 5e, Fig. S5c).
Since GC content also greatly affects coverage (Fig. 2c), we wondered if over-(mainly CHG and CHH) and under-estimating (mainly CG) methylation by WGBS could be linked to coverage. First, we generated heatmaps and coverage plots of all the libraries across PolV ChIP-seq peaks, because these are the places with large increases in CHG and CHH methylation in WGBS libraries (Fig. 4d). We found that for EM-seq libraries, although coverage uctuates, the ranges are typically quite small. On the other hand, there is a signi cant increase in coverage coinciding with the center of PolV ChIP-seq peaks for all WGBS libraries (Fig. S7). A reasonable explanation for this is that PolV binding sites are targets of RdDM and contain methylated CHGs and CHHs that are not converted by bisul te treatment; they therefore become better templates for PCR ampli cation (see data from previous sections) and gain higher coverage than their surrounding regions. Therefore, methylation levels and coverage are positively correlated in this case. The majority of the EM-seq hyper-CG DMRs are located within pericentromere heterochromatins and are highly methylated (Fig. 4a, Fig. 5e, Fig S8a), but occasionally they can be found in euchromatin regions and within genes (Fig. S8b). We chose the best WGBS library (400ng input with 12 cycles of PCR) and its corresponding EM-seq library and analyzed their coverage across EM-seq hyper-DMRs (Fig. S8ce). Interestingly, WGBS coverage spikes in EM-seq hyper-DMRs as well, and EM-seq coverage again shows only small uctuations. The low GC content of EM-seq DMRs (Fig. 5f) could be the cause of high coverage in WGBS libraries (see Fig. 2c). Moreover, according to Fig. 2c, in regions where GC content is low (~30% or less, approximately the range of GC content found in EM-seq hyper-DMRS, see Fig. 5f), a further reduction in GC content (e.g. that caused by bisul te treatment) will induce a sharp increase in WGBS coverage. This effect likely outweighs the PCR preference for unconverted DNA and causes WGBS to under-estimate methylation levels in these regions.

Methylation differences between Arabidopsis leaves and owers detected by EM-seq
We applied the EM-seq method to analyze methylation differences between Arabidopsis leaves and owers. We used 150 ng input DNA and 6 cycles of PCR, and generated 4 replicates for each tissue. Overall, CG and CHG methylation levels were slightly higher in owers than in leaves, and CHH methylation was about the same in both tissues (Fig. 6a). Metaplots in genes reveal that there are very small differences between leaves and owers in gene body methylation levels -CG plots are almost identical and CHG and CHH are close to noise levels in both tissues (Fig. 6b). For TEs, CG is very slightly increased and CHH is very slightly decreased in owers compared to leaves (Fig. 6c), and the same trends can be observed in pericentromeric heterochromatins in chromosomal methylation plots (Fig. 6d). The most striking nding from these analyses is that CHG methylation is signi cantly higher in owers than in leaves (Fig. 6cd). Consistent with this, a much larger number of ower hyper-DMRs are called in the CHG context than in CG or CHH (Fig. 7a, Table S5). Flower hyper-CHG DMRs are located mainly in pericentromeric heterochromatins and not inside or close to genes (Fig. 6d, Fig. 7bc). DMRs for non-CG methylations are not enriched within genes, as non-CG methylations are usually not found in the genes (Fig. 6b, Fig 7c).
CHG methylation is mainly mediated by CHROMOMETHYLASE2 (CMT2) and CMT3 via a self-reinforcing loop involving histone H3K9 methylation [22,37,38]. Indeed, when examining the transposons in heterochromatin regions (that have ower hyper-CHG DMRs) we observed a higher level of H3K9me2 by ChIP-seq in owers than in leaves, whereas there was very little change in the H3K9me2 levels of euchromatic TEs that did not show ower hyper-CHG DMRs (Fig. 7d). We note that the absolute level of H3K9me2 is much higher in heterochromatin regions than in chromosome arms (Fig. 7d), which is a characteristic of the Arabidopsis epigenome [39]. These results are consistent with a more active CHG and H3K9me2 self-reinforcing loop in owers affecting heterochromatic TEs.
Although the overall differences in CG methylation within gene bodies are minimal between leaves and owers (Fig. 6b), hyper-CG DMRs from both owers and leaves were enriched in gene exons and introns ( Fig. 7c; see an example in Fig. S9a). Indeed, when we plot CG methylation on genes with hyper-CG DMRs in either leaves (Fig. S9b, left panel) or owers (Fig. S9b, right panel), we observe clear differences. Previous studies have shown that gene body-methylated genes in plants are often house-keeping genes, constitutively expressed, and exhibit moderately high expression levels [28,40,41]. We found that majority of the genes with hyper-CG methylations in either leaves or owers are differentially expressed between leaves and owers (Fig. S9c). However, this does not seem to be speci c, as we obtained similar results when looking at randomly selected genes (data not shown). This and the fact that both upregulated and downregulated genes in both tissues can show increased body CG methylation (Fig.  S9c) suggests that gene body methylation does not directly regulate the expression of these genes.

Discussion
Bisul te sequencing, despite being the gold standard for methylation detection, is known to have shortcomings, including DNA damage, false positives due to non-conversion, uneven and missing coverage, and biased representation of methylated versus unmethylated DNA in the nal library. In this study, we performed detailed analyses of these aspects and compared the results from WGBS to those from EM-seq, a newly developed, enzyme-based, bisul te-free method for methylation detection. Our WGBS ndings are consistent with those in previously published literature [7, 11-13, 21, 25-27]. In all the comparisons between EM-seq and WGBS, EM-seq appears to be mostly free of the problems that WGBS has, suggesting that it is a superior method. In addition, library preparation conditions such as input DNA amount and number of PCR ampli cation cycles have very little effect on the results of EM-seq, while the results of WGBS are greatly affected by these parameters, even when all WGBS libraries are generated in a single batch. Therefore, we propose that EM-seq is more desirable than WGBS, especially for big data projects that require integration of datasets obtained across a wide variety of source materials, locations and time points, and processed by personnel with different levels of expertise.
Because of its high reproducibility and low background, EM-seq is also suitable for projects aimed at revealing subtle methylation changes in different samples and/or conditions. We used EM-seq to study methylation differences between Arabidopsis leaves and owers, and showed that DMRs can be called with high con dence even in places where the two tissues have very similar methylation levels. This approach can be expanded to other tissues and across organisms. EM-seq can work with much lower amounts of starting DNA than WGBS (as low as 100 pg [20]), which makes it ideal for single-cell methylome studies. Currently, bisul te is used in these studies [42,43]. Based on the performance of EMseq observed here, we expect that substituting bisul te with TET2 and APOBEC3A will greatly enhance the success rate of single-cell methylome library generation and increase the coverage per cell.
It is worth noting that after methylated cytosines are oxidized by TET family enzymes to 5-fCs and 5-caCs, a different approach can be taken to differentiate methylated from unmethylated cytosines. In a recently published TET-assisted pyridine borane sequencing (TAPS) procedure [44], TET1 is used in the rst step (to catalyze a similar reaction to that catalyzed by TET2 in EM-seq), and pyridine borane is used in the second step to convert 5-fCs and 5-caCs to dihydrouracils (DHU). DHUs basepair with adenines during PCR and thus are ampli ed as thymines. A set of new bioinformatics tools is needed for TAPS, since in TAPS data methylated cytosines appears as thymines and unmethylated cytosines stay as cytosines, which is the opposite to WGBS and EM-sEq. TAPS shows promise in overcoming many of the issues of WGBS; however, since it introduces a different skewed base composition landscape for the current library preparation reagents, sequencers, and bioinformatics tools to deal with, it could potentially lead to complications.
There are new developments in long-read sequencing technologies that enable direct sequencing of the original DNA without fragmentation or ampli cation, thus bypassing the need for bisul te treatment. 5methylcytosines can be differentiated from other bases by the virtue of distinct polymerase kinetics [45,46] or unique electronic signal characteristics [47,48] displayed by different bases. Nevertheless, WGBS still compares favorably in terms of accuracy, reliability, and cost effectiveness, against these technologies, at least in their current iterations [49,50]. An interesting alternative to improve upon these technologies is to combine bisul te or enzymatic-based conversion with these long-read technologies. For example, bisul te treatment has been used in combination with PacBio SMRT sequencing [51] and TAPS has been used with both SMRT and Oxford Nanopore sequencing [50]. TET2 and APOPEC3A from the EM-seq protocol can likely be adapted to the same long-read technologies, and because of their nondestructive nature, they are expected to better preserve the intactness of high molecular weight genomic DNA. However, in both of the previously published methods, because sequencing of PCR products containing newly formed thymines (which correspond to unmethylated cytosines following bisul te treatment or methylated cytosines following TAPS treatment) outperforms sequencing of intermediates (like oxidized methylcytosines, uracils, or DHUs) in terms of sensitivity, accuracy, and minimal coverage required, the converted DNAs have to be ampli ed with region-speci c primers (aiming to obtain products up to 10 kilobases long) before being subjected to SMRT or Nanopore sequencing. This makes it impractical to use these methods for whole genome methylation measurement of most eukaryotic organisms. Further advancements in long-read sequencing technologies to allow unequivocal identi cation of the products of bisul te, TETs, pyridine borane, or APOBECs are needed for true high e ciency, ampli cation-free global detection of methylation.

Conclusion
Enzymatic methyl-seq (EM-seq) uses non-destructive enzymatic reactions, utilizing TET2 and APOBEC3A to convert unmethylated (but not methylated) cytosines to uracils. This approach generates the same product as bisul te treatment, which can then be sequenced and analyzed in the same way. Here we showed that compared to whole-genome bisul te sequencing (WBGS), EM-seq has a higher mapping rate, lower duplication rate, and lower false positive rate. EM-seq not only displays higher coverage than WGBS, but the coverage is also less affected by GC content. In terms of methylation detection, EM-seq covers more cytosines than WGBS and does not over-estimate methylation levels as WGBS does, especially in the context of CHG and CHH. EM-seq exhibits better consistency within libraries made from the same materials in all quality aspects examined and in report of methylation levels. Thus, in many respects EM-seq is superior to WGBS.

Plant materials
Arabidopsis plants of the Columbia-0 (Col-0) ecotype were used in this study. All plants were grown at 22°C in a long day (16 hour light, 8 hour dark) growth room. Flowers and leaves were collected from 4-5 week-old plants.
Genomic DNA extraction and fragmentation  [54]. Chromosome arm and heterochromatin regions were de ned with H3K9me2 ChIP-seq data [37]. Regions highly enriched with H3K9me2 ChIP-seq signal were de ned as heterochromatin regions.

Chromatin immunoprecipitation (ChIP) assays
H3K9me2 ChIP-seq in Arabidopsis leaves was previously published [57]. For H3K9me2 ChIP-seq in Arabidopsis owers, 3 grams of Arabidopsis Col-0 wild-type unopened ower buds were collected. The nuclei were isolated from these materials for in vitro cross-linking with 1% formaldehyde. Nuclei were lysed and the chromatin was sheared with Bioruptor Plus (Diagenode). The sheared chromatins were equally separated for two ChIPs. 5 μl of anti-H3K9me2 (ab1220, abcam) and anti-H3 (ab1791, abcam) antibodies were added for chromatin immunoprecipitation, respectively. This experiment was performed by closely following the protocol described in a previous paper [58]. ChIP-seq libraries were prepared from DNA extracted from the ChIP experiment using Ovation Ultra Low System V2 Kit following manufacturer instructions (NuGEN). The libraries were sequenced on a HiSeq 4000 sequencer (Illumina) to obtain single-end 50 bp reads. To assess differences in H3K9me2 level at heterochromatin and euchromatin regions, we selected heterochromatin TEs that overlap with ower hyper-CHG DMRs and euchromatic TEs that do not overlap with ower hyper-CHG DMRs and compared their H3K9me2 level in leaf and ower tissue respectively. H3K9me2 levels were calculated by converting ChIP seq reads count to RPKM with bamCoverage function in bedtools [59].

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
All authors consent to publication.

Availability of data and materials
High-throughput sequencing data generated in this study can be accessed through Gene Expression Omnibus (GEO) database under accession number GSE151616.