CMT3 and SUVH4/KYP silence the exonic Evelknievel retroelement to allow for reconstitution of CMT1 mRNA

Background The Chromomethylase 1 (CMT1) has long been considered a nonessential gene because, in certain Arabidopsis ecotypes, the CMT1 gene is disrupted by the Evelknievel (EK) retroelement, inserted within exon 13, or contains frameshift mutations, resulting in a truncated, non-functional protein. In contrast to other transposable elements, no transcriptional activation of EK was observed under stress conditions (e.g., protoplasting). Results We wanted to explore the regulatory pathway responsible for EK silencing in the Ler ecotype and its effect on CMT1 transcription. Methylome databases confirmed that EK retroelement is heavily methylated and methylation is extended toward CMT1 downstream region. Strong transcriptional activation of EK accompanied by significant reduction in non-CG methylation was found in cmt3 and kyp2, but not in ddm1 or RdDM mutants. EK activation in cmt3 and kyp2 did not interfere with upstream CMT1 expression but abolish transcription through the EK. We identified, in wild-type Ler, three spliced variants in which the entire EK is spliced out; one variant (25% of splicing incidents) facilitates proper reconstitution of an intact CMT1 mRNA. We could recover very low amount of the full-length CMT1 mRNA from WT Ler and Col, but not from cmt3 mutant. Conclusions Our findings highlight CMT3-SUVH4/KYP as the major pathway silencing the intragenic EK via inducing non-CG methylation. Furthermore, retroelement insertion within exons (e.g., CMT1) may not lead to a complete abolishment of the gene product when the element is kept silent. Rather the element can be spliced out to bring about reconstruction of an intact, functional mRNA and possibly retrieval of an active protein. Electronic supplementary material The online version of this article (10.1186/s13072-018-0240-y) contains supplementary material, which is available to authorized users.


Background
Because of their abundance and the potential to modify/ mutate the genome [1,2], transposable elements (TEs) are subjected to multiple layers of epigenetic control to ensure their silencing. Accordingly, silencing of TEs is accomplished by complementary epigenetic mechanisms that include chromatin compaction associated with DNA methylation and histone modification often mediated by small RNAs [3]. Furthermore, recent study revealed an interesting correlation between TEs location, namely, near a gene, within a gene, in a pericentromere/ TE island, or at the centromere core and the regulatory mechanism underlying their silencing [4]. In Arabidopsis, methylation in the context of CpG is maintained by METHYLTRANSFERASE 1 (MET1), a homolog of the mammalian Dnmt1 [5,6]; in met1 mutant, most CpG methylation is lost [7,8]. Methylation in the CHG (H represents A, C or T) context is maintained by a plantspecific DNA methyltransferase Chromomethylase 3 (CMT3) [9,10], which often requires methylation of histone H3 at lysine 9 by H3K9 methyltransferases SUVH4/

Open Access
Epigenetics & Chromatin *Correspondence: ggrafi@bgu.ac.il † Narendra Singh Yadav and Janardan Khadka have contributed equally to this paper 1 French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, 84990 Midreshet Ben Gurion, Israel Full list of author information is available at the end of the article KYP, SUVH5 and SUVH6 [11]; CMT2 methyltransferase, a homolog of CMT3, was found to methylate cytosine specifically at the CHH context [12]. Genome-wide profiling of DNA methylation revealed that CMT3 preferentially methylates transposons including those that are present as single copies within the genome [13]. Notably, single mutants of met1 or cmt3 displayed significant accumulation of CACTA transcript [14], the most abundant class II superfamily of plant transposons [15]. However, mobilization of these elements was very limited in single met1 or cmt3 mutants, while in met1 cmt3 double-mutant high-frequency transposition of these elements was observed [14]. In a recent work, Khan et al. [16] showed that the class II transposon Tag1 is essentially silenced by CMT3 via gene body CHG methylation; Tag1 was strongly and slightly activated in cmt3 and ddm1 (decrease in DNA methylation 1), respectively. While maintenance of CG, CHG and CHH methylation is maintained by MET1, CMT3 and CMT2 [9,10,17], de novo methylation in all sequence contexts is essentially established by DOMAINS REARRANGED METHYL-TRANSFERASEs (DRM1/DRM2) [18,19]. DRM1/2 mediate de novo methylation via 23-24-nt small interfering RNAs (siRNAs)-directed DNA methylation (RdDM) pathway. RdDM is a complex mechanism, which involves multiple factors and steps including the formation of double-stranded RNA from Pol IV-derived transcripts by RNA-dependent RNA polymerase 2 (RDR2) and its processing by Dicer-like 3 (DCL3) into 23-24-nt siRNAs, which are exported to the cytoplasm [20]. Besides methyltransferases, the chromatin remodeling factor DDM1 appears to play a major role in maintaining cytosine methylation in CpG and non-CG contexts and silencing of genes and transposable elements. Accordingly, mutation of DDM1 has led to significant reduction in global cytosine methylation [21,22], particularly at heterochromatic H3K9me2-enriched regions [23], to activation of certain genes [24] and to reactivation and transposition of retroelements [25,26]. DDM1 appears to provide DNA methyltransferases such as CMT2 access to H1-containing heterochromatin to maintain silencing of TEs in cooperation with the RdDM pathway [12].
Chromomethylase 1 (CMT1)-a paralog of CMT3-has long been considered a nonessential gene because, in certain Arabidopsis ecotypes (e.g., Ler), the CMT1 gene is disrupted by a single-copy Evelknievel (EK) retroelement, inserted within exon 13, or contains frameshift mutations, resulting in a truncated, non-functional protein [27]. The EK retroelement contains perfect long-terminal repeats (LTRs) and encodes for a protein containing 1451 amino acids [27]. Here, we wanted to explore the regulatory pathway responsible for EK silencing in the Ler ecotype and how EK silencing affects CMT1 transcription. We report here that CMT3-SUVH4/KYP is the major pathway controlling silencing of the exonic EK via inducing non-CG methylation independently of DDM1 and RdDM and that EK silencing is required for reconstitution of intact CMT1 mRNA.

Evelknievel (EK) is heavily methylated in the Ler genome
We investigated the regulation of the intragene class I, copia-like Evelknievel (EK) retroelement, which is inserted in exon 13 of the CMT1 gene in the Ler, but not in the Col genome. We first screened available methylome database of WT Ler [28] for the methylation pattern of the EK retroelement and found (Fig. 1a) that EK is heavily methylated in all cytosine contexts in WT Ler, which is consistent with the data obtained using methylation-sensitive enzymes by Henikoff and Comai [27]. In addition, cytosine methylation is extended downstream from EK insertion site into the 3′ end of the CMT1 gene. We have analyzed available databases for sRNAs in the Ler ecotype corresponding to EK and 3′ CMT1 sequences. Most prominent are 24-nt sRNAs covering the entire EK and the 3′ CMT1 region overlapping the DNA methylation pattern at this locus (Additional file 1: Fig. S1), suggesting that the RdDM pathway might be involved in cytosine methylation at this locus.
EK is expressed in cmt3 and kyp2, but not in ddm1 or RdDM mutants Commonly, retrotransposons are activated following exposure of plants to stress (e.g., tissue culturing) or in meristematic tissues [29]. However, in contrast to other retroelements, EK was not activated following exposure to acute stress (protoplasting, Additional file 1: Fig. S2b), suggesting that this element is under tight regulation. To gain insight into the regulation of the exonic EK and its effect on CMT1 transcription, we screened several epigenetic mutants in the Ler background for EK activation. We selected mutants affecting DNA and histone methylation, including ago4 and hen1 involved in RdDM as well as ddm1, cmt3 and kyp2. We generated cDNA from total RNA prepared from leaves of WT Ler, ago4, hen1, cmt3, kyp2 and ddm1 followed by PCR to amplify EK coding region (EKcr). The results showed (Fig. 1b) that EK was strongly activated in cmt3 and kyp2 mutants, but not in WT Ler, ddm1, ago4 and hen1. Interestingly, similar to EK, transcription of AtCOPIA18A retroelements, of which a single copy exists in the Ler genome on chromosome 5, is up-regulated in kyp2 and cmt3 mutants, but not in WT Ler, ddm1, ago4 and hen1 (Fig. 1b). These results suggest that transcriptional silencing of the retroelements, EK and AtCOPIA18A, is maintained by CMT3 and SUVH4/KYP independently of DDM1 and the RdDM pathway. To further confirm the specificity of TE regulation by SUVH4/KYP-CMT3 pathway, we analyzed the expression of solo LTR previously reported to be a target of the RdDM pathway [30] and AtMu1 reported to be activated in ddm1 mutant [31]. Consistent with previous reports, solo LTR was expressed in ago4 and hen1 RdDM mutants, while AtMu1 showed expression in ddm1 as well as in cmt3 and kyp2 mutants (Fig. 1b).
We analyzed cytosine methylation in various lines by bisulfite sequencing. To this end, genomic DNAs prepared from WT Ler, hen1, ago4, rdr2, ddm1, cmt3 and kyp2 were treated with sodium bisulfite and the resulting DNAs were used as templates for PCR amplification of EK-5′ LTR-CMT1 region and EK coding region (EKcr). PCR fragments were cloned into pJET1.2, and multiple clones from each line were sequenced (Additional file 1: BS seq). Cytosine methylation was significantly reduced, particularly in CHG and CHH contexts, in cmt3 and kyp2 mutants both in EKcr and in the EK-5′ LTR-CMT1 (Fig. 2a, b) where the 3′ CMT1 gene methylation status appears to be coherent with the methylation of EK in the mutants. Cytosine methylation, in all contexts, in ddm1, rdr2, hen1 and ago4 mutants were essentially similar to that of WT Ler. Notably, cmt3 mutant also displayed a significant effect on CG methylation, whereby 60% reduction in methylation was observed; such reduction is not seen in kyp2 mutant. Similar results were reported previously for cmt3 mutant displaying also a significant reduction in CG methylation [9]. This is probably due to dependency of CG methylation on non-CG methylated sites, which was observed in several mutants including kyp suvh5 suvh6 triple mutant and cmt3 [23]. More prominent effect was observed in the EK-5′ LTR-CMT1 where methylation at all cytosine contexts was completely erased in cmt3 and kyp2 mutants both in EK 5′ LTR and in CMT1 downstream sequence. We further confirmed the EKcr methylation pattern by chop PCR using various methylation-sensitive restriction enzymes (Additional file 1: Fig. S2).

Silencing of EK is required for reconstitution of an intact CMT1 mRNA
We examined how hypomethylation and transcriptional activation of EK affect transcription of the CMT1 gene. Previously it has been reported that in the Arabidopsis No-0 ecotype EK insertion within exon 13 of the CMT1 gene, had no effect on transcription and splicing upstream the insertion site and the full-length cDNA corresponding to the expected size of CMT1 mRNA could be amplified; although the entire EK is spliced out, the reading frame is shifted, resulting in a truncated, non-functional protein [27]. To examine CMT1 expression upstream of the EK insertion site, we first generated primers corresponding to exon 10 and exon 11 (Fig. 3a, primer set P1/P2) shown previously to amplify CMT1 mRNA fragment from No-0 ecotype [27]. We used cDNA generated from RNA prepared from flowers (where CMT1 is highly expressed [27]) of The intragene Evelknievel (EK) retroelement is heavily methylated in the Ler genome. a The methylation pattern at the indicated cytosine contexts and their position along the EK-CMT1 sequence in WT Ler (red) and WT Col (blue) is shown (GSE34658, [28]). Arrows indicate the transcriptional direction of CMT1 and EK. Note that DNA methylation at all cytosine contexts is extended to CMT1 3′ end downstream from the EK insertion site and that EK is absent from WT Col (no coverage) and CMT1 gene is essentially unmethylated. Broken lines in Col indicate the absence of EK. b EK is activated in cmt3 and kyp2, but not in ddm1 or RdDM mutants. Transcriptional activation of EK and other TEs in WT Ler and in various epigenetic mutants. cDNAs were prepared from RNA extracted from leaves of WT Ler, ago4, hen1, kyp2, cmt3 and ddm1 and subjected to PCR to amplify EKcr coding region, AtCOPIA18A coding region, Solo LTR and AtMu1. UBQ10 was used as a reference. M indicates DNA molecular size markers WT Ler and of various mutant lines; CMT1 expression in Col flowers was used as a reference. Results showed (Fig. 3b) the amplification of a single fragment from WT Ler and mutant lines, which was comparable to that of the Col ecotype. Thus, the upstream CMT1 gene is transcribed in Ler and mutants similar to its expression in Col and independently of EK methylation and transcription. Further examination of the upstream CMT1 transcription using primers corresponding to exon 11 and exon 13 (primer set P3/P4, Fig. 3a) revealed the amplification of a single fragment in Col and two fragments, with similar intensities, in WT Ler and mutant lines; one fragment was comparable to that of Col (SP1) and the second was slightly larger (SP2), which appears to be due to intron 12 retention (Fig. 3b). We verified the identity of the amplified PCR fragments by sequencing, showing that indeed SP2 fragment is a product derived from alternative splicing and is composed of intron 12 (Additional file 1: Fig. S3). This spliced variant is predicted to have a stop codon within intron 12 that might result in a truncated protein lacking the downstream methylase catalytic domains.
Next, we examine CMT1 expression downstream from the EK insertion site by PCR using primers corresponding to exon 14 (P5) and exon 16 (P6) that expected to yield a PCR product of 227 bp. All lines examined showed recovery of a single PCR product of about the expected size. Yet, the RNA level in WT Col flowers was slightly higher than its level in WT Ler, ddm1, ago4 and rdr2 mutants. Interestingly, CMT1 expression downstream from the EK insertion site was significantly enhanced in cmt3-and kyp2-mutant flowers (Fig. 3c). CMT1 downstream expression is probably driven by the CMT1 promoter, leading to transcription through the EK retroelement up to the CMT1 transcription termination site whereby the entire EK can be further spliced out from this chimeric transcript [27]. Notably, a chimeric transcript containing both CMT1 and EK sequences could not be recovered in WT Ler or in any mutant in the Ler background (Additional file 1: Fig. S4). Thus, the enhanced CMT1 downstream expression in cmt3 and kyp2 mutants could have been resulted from EK 5′ LTR functions as a bidirectional promoter, a topic currently studied in the laboratory.
To test for splicing out of the entire EK transcript, we used a forward primer corresponding to exon 13 (P7) upstream from the EK insertion site and a reverse primer corresponding to exon 16 (P6) downstream from the EK insertion site. If the entire EK has been spliced out, it should yield a PCR product of about 268 bp. The results showed (Fig. 3d) as expected high CMT1 expression of this region in WT Col lacking EK. Low level of this region was recovered from WT Ler, ddm1, ago4 and rdr2 mutants, but no recovery of a PCR fragment could be detected in cmt3 and kyp2 mutants (Fig. 3d).
Direct sequencing of the resulting PCR products from WT Ler and detailed analysis of the sequencing chromatogram revealed three spliced variants of the entire EK from CMT1 transcript (Fig. 4a); these three spliced variants were confirmed by cloning the PCR products into pJET1.2 (Fig. 4b) and sequencing (Fig. 4c-e). The spliced variant SV1 has been described by Henikoff and Comai [27] in which the penultimate base of the EK 3′ LTR serves as a splice donor site, splicing out the entire EK together with a portion of exon 13 and intron 13, to the correct splice acceptor site of exon 14. As a result, the reading frame is shifted, resulting in a truncated, non-functional CMT1 protein (Fig. 4c). A second spliced variant 2 (SV2) in which the EK is spliced out using a non-canonical donor site (GC) within the duplicated sequence upstream to the EK insertion site (GGCTG-EK) and a canonical AG site contributed by the penultimate base (A) of the EK 5′ LTR and the first base of the The total number of CG (blue column), CHG (brown column) and CHH (green column) sites in amplified DNA sequences is indicated in square brackets on the right duplicated sequence downstream the EK insertion site (EK-GGCTG), resulting in the correct CMT1 reading frame that can potentially produce an intact, functional protein (Fig. 4d). The third spliced variant (SV3) in which the entire EK is spliced out using canonical sites (GT-AG) (see Fig. 4e) leads to a frameshift and production of a truncated, non-functional CMT1 protein due to a reading frameshift and premature termination (Fig. 4e).
To estimate the proportion of the different spliced variants, we cloned the PCR products derived from WT Ler cDNAs into pJET1.2. We randomly isolated colonies and analyzed them by PCR followed by separation on agarose gel. Based on the size of the PCR product, we could distinguish between SV1 (fast-migrating band) and SV2 + SV3, and the latter variants differ in 5 bp and run on the gel as one band (slow-migrating band) (Fig. 4b). . EK LTRs are marked by red arrowheads. b CMT1 upstream transcript is alternatively spliced showing intron 12 retention. cDNA was prepared from RNA extracted from flowers derived from WT Col, WT Ler, ddm1, kyp2, cmt3, ago4 and rdr2 and subjected to PCR to amplify CMT1 RNA sequences using primer set 1 + 2 (ex10-F + ex11-R) or primer set 3 + 4 (ex11-F + ex13-R). Note that primer set 3 + 4 yielded two PCR fragments SP1 and SP2. Actin was used as a reference. PCRs were performed for 35 cycles except for actin (30 cycles). M indicates DNA molecular size markers. The predicted alternative spliced variants SP1 and SP2 are schematically shown below. c CMT1 downstream expression using primer set 5 + 6 (ex14-F + ex16-R). WT Ler gDNA was used as a reference for PCR product derived from RNA. Note the enhanced downstream CMT1 expression in kyp2 and cmt3 mutants. Actin was used as a reference. Ler genomic DNA (gDNA) was used to confirm amplification from RNA. PCRs were for 35 cycles except for actin (30 cycles). d Analysis of splicing out of the entire EK retroelement using primer set 6 + 7 (ex13-F + ex16-R) on both sides of the EK insertion site. Note lack of a PCR product in kyp2 and cmt3 mutant. Actin was used as a reference RNA. Col genomic DNA (gDNA) was used to confirm amplification from RNA. PCR was for 42 cycles except for actin (30 cycles) Yadav et al. Epigenetics & Chromatin (2018) 11:69 This analysis revealed 224 positive colonies in which 130 (~ 58%) were related to SV1 and 94 (~ 42%) to SV2 + SV3.
To assess the proportion of SV2 and SV3 variants, we sequenced all 94 clones carrying the slow-migrating bands (SV2 + SV3) and 83 clones produced valid-specific sequences. Results showed that 51 (61%) of the clones contain the SV2 fragment and 32 (39%) clones the SV3 fragment. Thus, the proportion of the various spliced variants in WT Ler is 58% of SV1, 25% of SV2 and 17% of SV3 (Fig. 4c, e). Finally, we tested for the occurrence of the full-length CMT1 mRNA in WT Ler using RT-PCR. Total RNAs prepared from flowers of WT Col, Wt Ler and cmt3 (Ler background) were subjected to cDNA synthesis using oligo dT followed by first PCR to amplify CMT1 coding and 3′ UTR regions (Fig. 5a). The full-length CMT1 mRNA appears to be quite rare inasmuch as a faint band of the expected size was visible in Col, but not in Ler or cmt3 mutant (Fig. 5b). However, nested PCR using the first PCR product as template revealed a clear amplified CMT1 full-length coding region in both Col and Ler wild-type ecotypes, but not in cmt3 mutant (Fig. 5b). The identity of the PCR product as CMT1 was confirmed by sequencing of a PCR fragment derived from the amplified full-length CMT1 cDNA (Additional file 1: Fig. S5).

Epigenetic control of the exonic Evelknievel retroelement
We showed that the intragenic (exonic) single-copy Evelknievel (EK) retroelement located within exon 13 in the CMT1 gene is regulated by CMT3 and SUVH4/KYP that cooperate to maintain EK silencing via inducing non-CG methylation independently of DDM1 and the RdDM pathway. Thus, in addition to DDM1 and RdDM pathways, our data pointed to CMT3-SUVH4/KYP as an important, independent pathway controlling long TEs located within or near genes at euchromatic regions. Furthermore, silencing of EK by CMT3/KYP is required for splicing out of the entire EK and for reconstitution of a functional CMT1 mRNA. Thus, our data pointed to an interesting phenomenon whereby the function of CMT1 is rendered partially active by the action of its paralog CMT3 (see model Fig. 6). Consequently, retroelement insertion within an exon does not necessarily lead to complete abolishment of the gene product when the retroelement is kept silent. Rather the retroelement can be spliced out to bring about reconstitution of an intact, functional mRNA. We cannot exclude the possibility that the active CMT1 further participates in methylation and silencing of EK to ensure the persistence of its own expression (Fig. 6).
It is commonly accepted that KYP, SUVH5 and SUVH6 H3K9 methyltransferases are required for CMT3dependent CHG methylation genome wide [11]. Our data clearly showed that redundancy between KYP, SUVH5 and SUVH6 in methylating EK does not exist inasmuch as the sole mutation of KYP was sufficient to drive hypomethylation and expression of EK. Indeed, certain CHG methylation sites are controlled solely by KYP, while mutation in SUVH5 and SUVH6 had no notable effect on CHG methylation [23]. Furthermore, in an early work, Tompa et al. [13] performed a genome-wide mapping of DNA methylation in cmt3 mutant via fragmentation of the genome with methylation-sensitive enzymes followed by size fractionation and hybridization to microarrays. They identified eight loci displaying reduction in CHG methylation, four of which were found to be low copy number retrotransposons including AtCOPIA10 (At4g21360) and AtCOPIA33 (At2g09830); the later is located at the centromeric region of chromosome 2 near a highly active gene (At2g09990) encoding for 40S ribosomal protein.
Thus, it appears that the CMT3 and SUVH4/ KYP cooperate to maintain non-CG methylation as well as H3K9 methylation and consequently silencing of low copy number, long TEs, which are essentially located near or within active genes. Our analysis of the single-copy AtCOPIA18A retroelement (gene ID At5g35935), located on chromosome 5 near active genes (e.g., At5g35970), showed that this element is transcriptionally activated in both kyp2 and cmt3 mutants. In a recent article, Sigman and Slotkin [4] proposed that chromosomal location of TEs (i.e., near a gene, within a gene, in a pericentromere/ TE island, or at the centromere core) provides the first rule determining the specific regulation of TEs. Accordingly, Sigman and Slotkin [4] proposed that TEs located near genes as well as within genes are initially targeted for DNA methylation by the RNA-directed DNA methylation (RdDM). Once DNA methylation is established, maintenance of methylation by MET1 or CMT3 methyltransferases and histone H3 dimethylation by SUVH4/ KYP is sufficient to maintain methylation and propagate silencing. Consistent with this proposal is the finding that the EK retroelement inserted within the CMT1 gene is not activated in RdDM mutants, ago4 and hen1, probably because CMT3 and SUVH4/KYP can maintain EK methylation in the absence of RdDM pathway. Similarly, CpG methylation of the 35S promoter sequence was maintained by the activity of MET1 in the absence of RNA trigger [19]. However, unlike CMT3 and SUVH4/KYP, suppression of MET1 activity did not block the establishment of RNA-directed CpG methylation. Notably, analysis of available sRNA databases for the Ler ecotype revealed an overlap with the DNA methylation pattern, whereby 24-nt sRNAs, presume to induce DNA methylation at complementary sequences [32], are covering the entire EK and the 3′ CMT1 region (Additional file 1: Fig.  S1), suggesting that the RdDM pathway involves in DNA methylation and silencing of EK. However, our results showed that the RdDM is not able to maintain methylation in EK and 3′ CMT1 region in the absence of CMT3 or SUVH4/KYP. One possible explanation is that DNA methylation of the EK retroelement and downstream CMT1 is initially established by the RdDM pathway in conjunction with CMT3 and SUVH4/KYP methyltransferases, rather than with DRM methyltransferases. This may gain support by the findings that DRM and CMT3 may act in a complementary manner to maintain RNAdirected non-CG methylation and that mutation of CMT3 alone could release, to some extent, pre-established RNA-dependent transcriptional gene silencing [18,19]. Alternatively, sRNAs target methylation directly via the CMT3/KYP machinery rather than the canonical AGO4-RdDM pathway. Finally, we cannot exclude the possibility that besides their function in establishing and maintaining DNA methylation, the 23-24-nt sRNAs may be involved in posttranscriptional gene silencing by directing transcript cleavage [33], which could explain the lack of CMT1-EK transcript and the low level of CMT1 full-length mRNA in the Ler ecotype.

Interplay between EK activation and CMT1 transcription
Commonly TEs are considered as mutable elements in which their insertion into coding or regulatory regions of genes might interfere with proper gene transcription, leading to production of aberrant or new transcripts. Yet, plants have evolved various mechanisms to cope with TEs inserted near or within genes [34]. Accordingly, intragenic TEs do not necessarily interfere with proper transcription of the invaded genes due to an IBM2dependent mechanism that allows synthesis of full-length RNA over intragenic TEs carrying repressive epigenetic marks [35,36]. Our study showed that this is applicable also for exonic TEs. Although EK insertion resulted in alternative splicing proximal to the insertion site, namely retention of intron 12 (Additional file 1: Fig. S2), this pattern of CMT1 upstream transcription is retained whether EK is methylated and silent (WT Ler, ddm1, RdDM mutants) or unmethylated and strongly activated (cmt3 and kyp2). Yet, downstream CMT1 transcription is affected by EK insertion displaying a relatively low level of CMT1 RNA downstream of the insertion site. Similarly, genes carrying highly methylated intronic TEs displayed defects in transcription downstream of TE insertion site [35]. Methylation and silencing of EK appear to be critical for transcription through the EK element. The chimeric transcript undergoes further processing splicing out the entire EK, resulting in a transcript yielding a truncated, non-functional protein [27]. However, our study revealed the existence of two additional spliced variants, namely SV3 whose splicing follows the canonical GU-AG rule yielding a transcript encoding for a truncated protein and SV2 whose splicing is non-canonical (GC-AG, [37]) but allows for faithful reconstitution of the CMT1 RNA that potentially can yield a functional protein. We found that the frequent occurrence of this non-canonical splicing (GC-AG) of the entire EK from CMT1 transcript is about 25%. Splicing through non-canonical sites (e.g., GC-AG) has been reported in both plants and animals [38,39]. Thus, although it has been proposed previously that CMT1 is dispensable in certain Arabidopsis ecotypes carrying EK within the CMT1 gene (Ler, No-0, RLD, [27]), we cannot exclude the possibility that in these ecotypes a functional CMT1 mRNA is produced at a very low level (~ 12.5% considering intron 12 retention) due to splicing Fig. 6 A feed-forward model illustrating the regulation of Evelknievel (EK) retroelement by CMT3 and CMT1 proteins. a Silencing of EK inserted within the CMT1 gene. CMT3 in concert with SUVH4/KYP maintain cytosine methylation over the entire EK and 3′ CMT1 region, particularly at the CHG and CHH contexts (red and green lollipops, respectively) and to some extent also CG methylation (blue lollipops) as well as histone methylation at lysine 9, resulting in a complete silencing of EK. Consequently, transcription from the CMT1 promoter proceeds through the EK to the CMT1 transcription termination site. The CMT1/EK/CMT1 transcript undergoes splicing in which EK is spliced out faithfully (in 25% of the events), leading to reformation of mature CMT1 mRNA and possibly to retrieval of a functional CMT1 protein, which could potentially further methylates EK to ensure its own expression. b Activation of EK. In the absence of KYP or CMT3, EK is transcriptionally activated due to absence of CHH and CHG methylation, resulting in transcriptional interference of CMT1 rendering it incapable of transcription beyond the EK insertion site. This leads to formation of a truncated CMT1 mRNA and to an aberrant, non-functional CMT1 protein out of the entire EK. Indeed, using RT-PCR we could recover low amount of full-length CMT1 mRNA in WT Ler, but not in cmt3 mutant. Whether this level is sufficient for synthesizing a functional CMT1 protein is currently under investigation.
In cmt3 and kyp2 mutants, CMT1 downstream transcription driven by the CMT1 promoter was completely abolished. This can be explained by the oppositely oriented EK and CMT1 combined with a higher rate of EK transcription, resulting in a transcriptional interference [40] that renders CMT1 incapable of transcription beyond the EK insertion site (Fig. 5). Yet, EK activation in cmt3 and kyp2 led to strong expression of CMT1 downstream of the EK insertion site via alternative mechanism. We hypothesize that in cmt3 and kyp2 mutants the EK 5′ LTR may function as a bidirectional promoter driving both EK and CMT1 transcription in opposite directions, a topic currently studied in the laboratory. Precedence over LTRs functioning as bidirectional promoters was described in animals [41][42][43] as well as in plants [30,44].

DNA and RNA isolation, cDNA production and expression analysis
DNA was extracted from plant tissues using Genomic DNA Mini Kit (Geneaid, Taiwan). RNA was prepared from leaves, protoplasts or flowers using RNeasy Plant Mini Kit (Qiagen). One microgram of total RNA was used for cDNA production using the Verso cDNA Kit (Thermo Scientific) according to the manufacturer's protocol. The resulting cDNA was subjected to PCR to amplify the Evelknievel using primer set EK-RTF/EK-RTR and AtCOPIA18A using primers 18A-F/18A-R and Solo LTR using primer set Solo-F/Solo-R and AtMu1 using primer set AtMu1-F/AtMu1-R. Upstream analysis of CMT1 expression was performed using two primer sets, ex10-F(P1)/ex11-R(P2) and ex11F(P3)/ex13R(P4), and for downstream CMT1 expression, we used ex14-F(P5)/ex16-R(P6). Finally, for the analysis of splicing out of the entire EK we used ex13-F(P7)/ex16-R(P6) primer set. Actin or ubiquitin-10 was used as control (primer set ACT-F/ACT-R and UBQ10-F/UBQ10-R, respectively) (for primer sequences, see Additional file 1: Table S1). PCR conditions were as follows: 95 °C, 2 min; 30-40 cycles of 95 °C, 30 s; 60 °C, 30 s; 72 °C, 30 s; followed by 72 °C, 5 min. PCR products were resolved on 1.5% agarose gel stained with ethidium bromide.

Bisulfite sequencing
Bisulfite conversion was carried out using the Qiagen EpiTect Bisulfite Kit according to the manufacturer's instructions on genomic DNA extracted from rosette leaves of WT Ler and various mutant lines in the Ler background using Genomic DNA Mini Kit (Geneaid, Taiwan). The reactions with a C-to-T conversion rate higher than 98% (as determined by the sequencing of 10 clones of an unmethylated control DNA, a 1562-bp PCR product) were used for further analyses. The bisulfite-treated DNA was used for PCR amplification of selected target regions of Evelknievel retroelement using primer sets bsEK5LTR-F1/bsEx14-R1 and bsEKcr-F1/bsEKcr-R1 to amplify promoter and internal EK regions, respectively. To confirm identity of amplified EK fragments, the initial PCR was followed by nested PCR using primer sets bsEK5LTR-F2/bsEx14-R2 and bsEKcr-F2/bsEKcr-R2 to amplify promoter and internal EK regions, respectively. The conditions for both PCR reactions were as follows: 95 °C, 2 min, 30 cycles of 95 °C, 30 s, 60 °C, 30 s, and 72 °C, 30 s, followed by 72 °C, 5 min. The amplified sequences were separated by agarose gel electrophoresis before purification using the QIAquick PCR Purification Kit (Qiagen). The purified fragments were ligated into a pJET1.2 cloning vector using a CloneJET PCR Cloning Kit (Thermo Fisher Scientific) and transformed into competent E. coli (TOP10 cells). Positive clones were selected by colony PCR followed by plasmid isolation with the PrestoTM Mini Plasmid Kit (Geneaid). At least ten individual clones for each region of each genotype were sequenced using pJET1.2 primers at Macrogen Europe (Amsterdam, The Netherlands). The sequences were analyzed with Kismeth software [45] to obtain the percentage of methylated sites for each sequence context.

Analysis of DNA methylation by methylation-sensitive enzymes followed by PCR (Chop PCR)
Genomic DNA was subjected to Chop PCR (methylation-sensitive enzyme digestion followed by PCR) with methylation-sensitive enzymes including HpaII, MspI, Sau3AI and BglII and subjected to PCR to amplify various EK DNA sequences. The following primers were used (see Additional file 1: Table S1): EK-RTF/EK-RTR primer set for recovery after HpaII and MspI digestion, EKSau3-F/EKSau3-R and EKBgl-F/EKBgl-R primer sets for recovery after Sau3AI and BglII digestion, respectively.