ZFP57 regulation of transposable elements and gene expression within and beyond imprinted domains

Background KRAB zinc finger proteins (KZFPs) represent one of the largest families of DNA-binding proteins in vertebrate genomes and appear to have evolved to silence transposable elements (TEs) including endogenous retroviruses through sequence-specific targeting of repressive chromatin states. ZFP57 is required to maintain the post-fertilization DNA methylation memory of parental origin at genomic imprints. Here we conduct RNA-seq and ChIP-seq analyses in normal and ZFP57 mutant mouse ES cells to understand the relative importance of ZFP57 at imprints, unique and repetitive regions of the genome. Results Over 80% of ZFP57 targets are TEs, however, ZFP57 is not essential for their repression. The remaining targets lie within unique imprinted and non-imprinted sequences. Though the loss of ZFP57 influences imprinted genes as expected, the majority of unique gene targets lose H3K9me3 with little effect on DNA methylation and very few exhibit alterations in expression. Comparison of ZFP57 mutants with DNA methyltransferase-deleted ES cells (TKO) identifies a remarkably similar pattern of H3K9me3 loss across the genome. These data define regions where H3K9me3 is secondary to DNA methylation and we propose that ZFP57 is the principal if not sole methylation-sensitive KZFP in mouse ES cells. Finally, we examine dynamics of DNA and H3K9 methylation during pre-implantation development and show that sites bound by ZFP57 in ES cells maintain DNA methylation and H3K9me3 at imprints and at non-imprinted regions on the maternally inherited chromosome throughout preimplantation development. Conclusion Our analyses suggest the evolution of a rare DNA methylation-sensitive KZFP that is not essential for repeat silencing, but whose primary function is to maintain DNA methylation and repressive histone marks at germline-derived imprinting control regions. Electronic supplementary material The online version of this article (10.1186/s13072-019-0295-4) contains supplementary material, which is available to authorized users.


ES cell culture
Embryonic Stem cells were derived in ground state pluripotency conditions as described previously [1]. Briefly, morula stage embryos were collected at E2.5 and cultured for 2 days in KSOM with inhibitors of Erk and Gsk3 (2i) signaling pathways (PD0325901 and CHIR99021). The trophectoderm was then removed using complement immunosurgery and used for genotyping. The inner cell mass was cultured in N2B27 media supplemented with 2i and LIF until establishment of stable ES cell cultures. Each ES cell line was characterized by staining for alkaline phosphatase activity and expression of Oct-4 (SantaCruz, sc8628) and SSEA1 (Millipore, MAB4301) markers. Finally, each newly established line was karyotyped and screened for mycoplasma contamination prior to use in downstream experiments. ES cells were cultured under 2i/LIF conditions up to a maximum of 15 passages, matching the passage number for WT and KO cells for each experiment. A total of three WT and two zygotic ZFP57 KO ES cell lines were analysed for ChIP-seq and RNAseq experiments.

DNA and RNA extraction
DNA was extracted using lysis buffer (50mM Tris-Cl pH8.5; 5mM EDTA; 200mM NaCl; 0.25% SDS) and 1 µg/ml Proteinase K for 1 hr. The mixture was purified with phenol/chloroform extraction and precipitated with isopropanol. RNA was extracted using Trizol (SIGMA) according to the manufacturer's protocol. RNA/DNA was quantified using Picodrop and quality assayed by gel electrophoresis.

RNA-seq
For RNA-seq library preparation, total Trizol extracted RNA (1 µg) was ribodepleted using NEBNext rRNA depletion kit (NEB, E6310L) according to the manufacturer's protocol and purified using Agencourt RNAClean XP beads (Beckman Coulter, A63987). RNA-seq libraries were prepared using ScripSeq v2 kit (Illumina, SSV21124) with 10 cycles of amplification. Successful ribodepletion was confirmed by post-sequencing QC library analysis. All sequencing was performed with pairedend reads using Illumina HiSeq. For RNA-seq, we conducted three biological replicates for WT samples and two biological replicates for ZFP57 mutants.
For analysis, RNA-seq reads were first quality checked with FastQC (version 0.11.5), followed by adapter and base quality trimming with TrimGalore (version 0.2.7). Reads post-QC were mapped to the mouse reference genome (mm10) with STAR (version 2.5.2, --alignEndsType EndToEnd --outFilterMultimapNmax 300 --outSAMmultNmax 1 --outMultimapperOrder Random) [2]. Expression of unique genes and transposable elements was quantified with featureCounts [3], which were subsequently converted into FPKM (fragments per kilobase per million mapped reads), representing gene expression levels. Differential expression analysis was performed with DESeq2 [4]. Annotations for RefSeq genes and transposable elements were downloaded from the UCSC genome browser [5]. Visualization of RNA-seq data was performed with RSeQC (version 2.4) [6] to convert data into BigWig format and WashU Epigenome Browser [7] were utilized to visualize the data.
Similar QC steps including FastQC and TrimGalore were performed on ChIP-seq reads. Reads post-QC were aligned to the mouse reference genome (mm10) with BWA (version 0.6.2) with pairend reads mode [9]. Potential PCR duplicates were removed with Picard tools ("MarkDuplicates" function). Peak calling was performed with MACS2 (version 2.1.0) [10] with broad peak option using only uniquely aligned reads (MAPQ>10). ZFP57 peaks were normalized to the ChIP-seq library in ZFP57 mutant ES cells, and KAP1, H3K9me3 peaks were normalized to their corresponding input control. Peaks called in at least two of the three WT biological replicates were treated as highconfidence peaks, and were used for subsequent analyses. Enrichment analysis on genomic features was performed with bedtools2 software (version 2.27.0) to generate intersection, identify closest unique genes/repetitive elements and to generate control shuffled regions. Gene promoters were defined as 1kb ± transcription start sites (TSSs).
Repeat content of ZFP57 peaks was determined based on genomic overlaps with repeats annotated in UCSC Genome Browsers (http://genome.ucsc.edu/) [5]. Peaks were characterized into repetitive and unique according to two factors, 1) if the fraction of overlap with repeat is >10%, and 2) location of ZFP57 motif -whether it resides in the repeat or the unique proportion of the peak. Heat maps of ChIP-seq results were plotted using R Bioconductor package "Genomation" [11]. Visualization tracks were generated using bedtools2 genomecov (version 2.27.0, -pc -bga -scale) with scaling factor being per million (10 6 ) the number aligned reads. Visualization of ChIP-seq data was performed with the WashU Epigenome Browser [7] and IGB viewer [12].

Bisulfite treatment and pyrosequencing
Purified DNA was bisulfite converted using SIGMA Imprint (MOD50) DNA modification kit and subjected to PCR and pyrosequencing analysis as described previously [8].           bound regions (n=100 randomly selected) versus those targeted by ZFP57 (darker coloured).