Patient samples and clinical data
A total of 16 CLL patients were included in this study. The CLL peripheral blood mononuclear cell (PBMC) samples were collected from the Section of Hematology and Coagulation, Sahlgrenska University Hospital. The CLL patients were diagnosed according to recently revised criteria  and samples were collected at the time of diagnosis. Clinical and molecular data are summarized in Additional file 2: data file 1A. All patients provided informed consent in accordance with the Helsinki Declaration and the study was approved by the local ethics review board. Genomic DNA and total RNA were extracted from CLL PBMCs and sorted B cell subpopulations with DNA (DNeasy Blood & Tissue Kit, 69504, Qiagen, Hilden, Germany) and RNA (miRNeasy mini kit, 217004, Qiagen, Hilden, Germany) extraction kits according to manufacturer’s protocol. The quality of RNA was measured using Experion RNA analysis kit (7007103, Bio-Rad, Hercules, USA). Five age-matched sorted CD+ 19 B cell DNA and RNA were bought commercially (3H Biomedical, Uppsala, Sweden). The quality of RNA was checked using 2100 Bioanlyser Instrument (Agilent, Santa Clara, United States) and the sent for RNA sequencing.
Isolation of normal B cell subpopulations and CLL B cells from CLL PMBC samples
Four buffy coats from normal healthy blood donors age matched with CLL patients were collected from Sahlgrenska university hospital. PBMC were isolated from the buffy coats using Lympho-Prep (lymphoprep, 1114545, Axis-shield, Oslo, Norway) density gradient sedimentation and were then enriched for B cells on a AutoMACS using CD20 microbeads in accordance with the instructions from the manufacturer (Miltenyi Biotec, Bergisch Gladbach, Germany). After separation, B lymphocytes the purity of preparations was checked by flow cytometry which showed around 96% to 98% for CD19+ cells. Then the sorted B cells were stained with BB515-labeled anti-CD19, PE-labeled anti-IgM and BV421-labeled anti-CD27 antibodies before flowcytometric cell sorting using a BD FACSAria cell sorter (BD Bioscience, San Jose, USA). B cells were sorted into naïve (CD19+, CD27-), memory (CD19+, CD27+, IgM−) and marginal-zone like (CD19+, CD27+, IgM−) B cell populations . CLL B cell DNA used for Mass Spectrometry analysis was isolated from CLL PBMC patient samples in similar way as described for normal B cell isolation from normal PBMCs, using AutoMACS.
Selected reaction monitoring liquid chromatography tandem-mass spectrometry (HPLC-SRM-MS)
An SRM-based mass spectrometry assay (SRM-MS) was used to quantify 5-hydroxymethyl-2′-deoxycytidine (5-hdmC) and 5-methyl-2′-deoxycytidine (5-mdC) concentrations as a percentage of 2′-deoxyguanosine (dG) (e.g.—[5hmdC]/[dG] and [5mdC]/[dG]). The calibrated ranges for the analytes were 0–2.5% for 5hmdC and 0–25% for 5mdC using a fixed 40 pmol amount of dG as an internal standard. The calibration points were run as single replicates due to previously demonstrated high reproducibility of the assay. The samples had a measured range of 5hmdC as low as 0.01% and as high as 0.028%. The samples had a measured range of 5mdC between 4.61% and 5.69%.
MeDIP, hMeDIP and ChIP assay
MeDIP and hMeDIP assay was performed using MagMeDIP (C02010021, Diagenode, Liege, Belgium) and hMeDIP kits (C02010030, Diagenode, Liege, Belgium), respectively, according to manufacturer instructions using Mouse monoclonal antibody against 5-mC (33D3 clone, C15200081, Diagenode, Liege, Belgium) and Rat monoclonal antibody against 5-hmC (C15220001, Diagenode, Liege, Belgium). ChIP was performed using Shearing module kit and the OneDay ChIP Kit (Diagenode, Liege, Belgium), according to the manufacturer’s instructions. Briefly, genomic DNA (~ 3ug of for MeDIP and ~ 10ug for hMeDIP) was sonicated for 5 times with 30 s on and 30 s off for 4 cycles each time to obtain 300–600 bp chromatin using Bioruptor and shearing module kit (Diagenode, Liege, Belgium). 1% of fragmented DNA was removed as input sample into a fresh tube. The sheared DNA samples were incubated with magnetic beads and antibody at 40 C for overnight. After overnight incubation the unbound DNA was removed from antibody- bead mix and washed three times. The DNA was extracted from the beads and purified by phenol, chloroform and isoamylalcohol method.
ChIP was performed using Shearing module kit and the OneDay ChIP Kit according to the manufacturer’s instructions. The antibodies used were polyclonal antibody against H3K4me1 (C15410037, Diagenode, Liege, Belgium), polyclonal antibody against H3K27ac (C15410174, Diagenode, Liege, Belgium) and IgG (negative control; OneDay ChIP Kit). In brief, the CLL PBMCs were formaldehyde-crosslinked, lysed, and sonicated four times for 5 cycles (each cycle 30 s on and 30 s off) with Bioruptor and the Shearing module kit (Diagenode, Liege, Belgium). The concentration of resulting DNA fragments was determined by Qubit 2.0 fluorometer (Q32866, Invitrogen, Carlsbad, USA) and sent for MeDIP ans hMeDIP sequencing perfromed using Ilumina Hiseq 2000 platfrom.
Data processing and analysis of hMeDIP-seq, MeDIP-seq and ChIP-seq data
Adapter sequence from raw sequencing reads were removed using Cutadapt v2.2.1. Cleaned reads were than aligned to human GRCh38 reference genome, using Bowtie v1.0.0 --best -n 2 -k 1 -m 1 -t . Sex chromosomes, X and Y, were removed from further analysis to exclude gender bias. Aligned reads were used to call peaks with MACS v2.1.0 -f BAM --broad --broad-cutoff 0.05 -B -g hs, over corresponding inputs. The details and summary of all the obtained reads from CLL samples and normal control samples used in this study are listed in Additional file 2: data file 1.
Aligned reads were used to call peaks with MACS v2.1.0  -f BAM --broad --broad-cutoff 0.05 -B -g hs, over corresponding input samples. After peak calling for each sample, UCSC’s utility WigCorrelation was used on BED files, to estimate the correlation between samples. Since correlation was high between samples, another round of peak calling was performed, with the same parameters, this time peak calling was done simultaneously on all IGHV-mutated CLL samples, all IGHV-unmutated CLL samples and all CLL samples, regardless of IGHV mutational status, together. The details and summary of all the obtained reads from CLL samples and normal control samples used in this study are listed in Additional file 2: data file 1B. For MeDIP-seq and hMeDIP-seq an additional step was done, where CLL Differentially methylated Regions (DMRs) and CLL differentially hydroxymethylated regions (DhMRs) were analyzed, using MACS v2.1.0 bdgdiff. Comparisons were done the following way: CLL samples versus sorted B cells, IGHV-unmutated CLL samples vs. Naive B cell and IGHV-mutated CLL samples versus Memory B cell. Peak regions, DMRs and DhMRs were assigned to genes and other genomic features using HOMER v4.9 annotatePeaks, with a custom GTF annotation file from Gencode v24. GeneSCF v1.1 was used for pathway enrichment analysis of protein coding genes associated with DhMRs and DMRs, using KEGG and NCG databases and p-value 0.05 and FDR 0.1 as cut-offs. For visualization, HOMER v4.9  makeMetaGeneProfile and DeepTools v2.3.1 computeMatrix and plotProfile were used. Plotting was done in R v3.2.3, using ggplot2 and reshape2. All the raw data has been deposited in GEO, with the accession number GSE113386 and will be available for download to the public after acceptance.
Analysis of RNA-seq data
Raw reads containing adapter sequences, were removed, using CutAdapt v2.2.1. Cleaned reads were aligned to GRCh38 reference genome, using STAR v2.5.2b. Aligned reads were used for quantification, using SubRead v1.5.2 FeatureCount with Gencode v24 annotation. Normalization of read counts was performed with RPKM normalization, using an in-house script. Genes were separated in highly (RPKM 100 or more), intermediately (10–100), lowly (RPKM 1–10) and not (RPKM less than 1) expressed. Differential expression analysis was performed in R v3.2.3, using EdgeR. Comparisons were done the following way: CLL samples versus sorted B cells, IGHV-unmutated CLL samples versus naïve B cell and IGHV-mutated CLL samples versus memory B cell. GeneSCF v1.1.2 was used for pathway enrichment analysis of DE protein coding genes, using KEGG and NCG databases and p value 0.05 and FDR 0.1 as cut-offs. For validating the gene expression levels from CLL published RNA seq data , we obtained the raw data of RNA-seq samples for 96 patients (55 IGHV-mutated and 41 IGHVunmutated prognostic groups) along with 9 normal B cell samples as described in our earlier paper .
Quantitative analysis of 5hmC levels
DNA glucosylation and restriction endonuclease digestions were performed using the Epimark 5-hmC and 5-mC analysis Kit (NEB, Ipswich, MA) as per the manufacturers instructions. The primer sequences used in this analysis were listed in Additional file 2: Supplementary Table 1. A total of 5ug of genomic DNA was treated with T4 β-glucosyltransferase with and without UDP-Glucose substrate at 37 °C for overnight. Glucosylated DNA was digested with and without MspI and HpaII at 37 °C for overnight. 5hmC levels were quantitatively analysed using Real time Q-PCR with primers designed at peak regions containing GGCC sequence on target genes which were shown to be differentially hydroxymethylated between CLL samples and normal B cells (Additional file 7).
Analysis of super-enhancers
For the analysis of super-enhancers in CLL, ROSE software was used, with the following parameters: -g HG38 -i CLL-H3K27ac_peaks.gff -r f -r CLL_H3K27ac_aligned.bam -t 2500.
Cell lines, culture conditions, siRNA transfections and MTT assay
Two CLL cell lines, HG3 and MEC1 were used in this study for functional analysis. The cell lines were cultured in RPMI 1640 (Invitrogen), Carlsbad, USA) supplemented with glutamine (2 mM glutamine), 10% fetal bovine serum (FBS; Invitrogen, Carlsbad, USA), and 1× penicillin/streptomycin (Invitrogen, Carlsbad, USA). Transient transfections were carried out using Amaxa 4D-Nucleofector™ System (Lonza group AG, Basel, Switzerland) using the SF cell line Amaxa kit (V4XC-2032) according to the manufacturer’s instruction. We used MISSION Pre-designed siRNA (Sigma Aldrich, Missouri, USA) containing five small interfering RNAs (siRNAs) in equal concentrations for NSMCE1, TUBGCP6 and TUBGCP3 genes. Predesigned Stealth siRNAs were used for TET1 and TET2 (#HSS129586; #HSS12325; ThermoFischer Scientific, Waltham, USA). The silencer negative control siRNA (ThermoFischer Scientific, Waltham, USA) was used as control siRNA. Cell proliferation was analyzed using MTT assay after 48 h of post transfection using siRNAs specific for selected target genes with control siRNA as mentioned above. The MTT assay was performed according to the manufacturer’s protocol using Cell Titer 96 Non-Radioactive Cell Proliferation assay kit (G4000, Promega Madison, USA).