Identification of activated enhancers and linked transcription factors in breast, prostate, and kidney tumors by tracing enhancer networks using epigenetic traits

Background Although technological advances now allow increased tumor profiling, a detailed understanding of the mechanisms leading to the development of different cancers remains elusive. Our approach toward understanding the molecular events that lead to cancer is to characterize changes in transcriptional regulatory networks between normal and tumor tissue. Because enhancer activity is thought to be critical in regulating cell fate decisions, we have focused our studies on distal regulatory elements and transcription factors that bind to these elements. Results Using DNA methylation data, we identified more than 25,000 enhancers that are differentially activated in breast, prostate, and kidney tumor tissues, as compared to normal tissues. We then developed an analytical approach called Tracing Enhancer Networks using Epigenetic Traits that correlates DNA methylation levels at enhancers with gene expression to identify more than 800,000 genome-wide links from enhancers to genes and from genes to enhancers. We found more than 1200 transcription factors to be involved in these tumor-specific enhancer networks. We further characterized several transcription factors linked to a large number of enhancers in each tumor type, including GATA3 in non-basal breast tumors, HOXC6 and DLX1 in prostate tumors, and ZNF395 in kidney tumors. We showed that HOXC6 and DLX1 are associated with different clusters of prostate tumor-specific enhancers and confer distinct transcriptomic changes upon knockdown in C42B prostate cancer cells. We also discovered de novo motifs enriched in enhancers linked to ZNF395 in kidney tumors. Conclusions Our studies characterized tumor-specific enhancers and revealed key transcription factors involved in enhancer networks for specific tumor types and subgroups. Our findings, which include a large set of identified enhancers and transcription factors linked to those enhancers in breast, prostate, and kidney cancers, will facilitate understanding of enhancer networks and mechanisms leading to the development of these cancers. Electronic supplementary material The online version of this article (doi:10.1186/s13072-016-0102-4) contains supplementary material, which is available to authorized users.

10531 (PRAD), 5364 (BRCA), and 6657 (KIRC) probes that were methylated in both normal and tumor samples (mean of β normal >0.8 (parameter, methcutoff), mean of β tumor >0.8, β≤0.8 in less than 5 tumors (parameter, minTumor)); we identified 4092 (PRAD), 7522 (BRCA), and 3910 (KIRC) probes that were hypermethylated in tumors as compared to normal tissues (mean of β leukocyte <0.2 (parameter, unmethcutoff), mean of β normal <0.2, β tumor >0.3 (parameter, hypercutoff) in more than 5 tumors (parameter, minTumor)), and we identified 6251 (PRAD), 19882 (BRCA), and 10730 (KIRC) probes that were hypomethylated in tumors as compared to normal tissues (mean of β leukocyte >0.8 (parameter, methcutoff), mean of β normal >0.8, β<0.7 (parameter, hypocutoff) in more than 5 tumors (parameter, minTumor)). In the TENET program, DNA methylation datasets of leukocytes, smooth muscles, and fibroblasts are included and there is an option for users to include additional DNA methylation datasets of other cell types to identify differentially methylated enhancer regions not affected by purity. For this study, we set hypomethylation and hypermethylation cut-offs based on the distribution of tumor cellularity among the tumor samples we used: most TCGA tumor samples had > 20% tumor purity, and dichotomization of data with β > 0.3 for the hypermethylation and β < 0.7 for the hypomethylation was used to alleviate the influence of variable levels of tumor purity. We designed TENET such that it can detect changes that occur in a small number of tumors (parameter, minTumor), using absolute DNA methylation differences between tumor and normal samples (Additional file 1: Figure S2). In order to identify links found in about 1% of tumor samples, we used 5 tumor samples as a threshold. TENET is designed for users to set their own thresholds for beta values and the number of tumor samples, depending on individual datasets. In addition, TENET can investigate DNA methylation and gene expression relationships (see below) among cases only, allowing the detection of enhancer to gene links uniquely found in a small subgroup of cases (i.e. tumors, in this study).
Step 2: Identification of differentially methylated enhancer probe to gene links using Z scores (see Additional file 1: Figure S2b, left panels and right top panel) The steps to identify statistically significantly associated enhancer probe-gene links are: 1) to alleviate the tumor purity effect on DNA methylation analysis, for each enhancer probe we grouped the tumor samples into a set of unmethylated tumors (for hypermeth: β<0.3 (parameter, hypercutoff), for hypometh: β<0.7 (parameter, hypocutoff)) and a set of methylated tumors (for hypermeth: β>0.3, for hypometh: β>0.7), 2) a Z score was measured by first subtracting the mean of gene expression levels in the unmethylated tumors from the mean of gene expression levels in the methylated tumors then dividing by the standard deviation of gene expression levels in the unmethylated tumors, 3) enhancer probe to gene links with a Z score > |±1.645|, one tailed test critical value for 95% confidence level, were selected (parameter, Zcutoff) (a Z score < -1.645 was used for positive enhancer-gene expression relationships (E N :G + or E T :G + ) and a Z score > 1.645 was used for negative enhancer gene-expression relationships (E N :Gor E T :G -)). When the parameter, usecaseonly is FALSE, all samples were used to group into either a set of unmethylated samples, or methylated samples depending on DNA methylation levels, and Z scores were calculated.
Step 3: Identification of significant enhancer probe to gene links using permutation (see Additional file 1: Figure S2b, right bottom panel) To identify statistically significant links, a permutation test was performed with all of the other hyper or hypomethylated probes for each gene from the links found in step 2. An empirical p value cut-off, 0.05 (parameter, permutation.cutoff) was used to further refine the set of enhancer probes linked to expression of specific genes in this study.
Step 4: Optimize selection of enhancer probe to gene links (see Additional file 1: Figure S2c) To ensure the enhancer probe to gene links have statistically significant changes in gene expression levels between the hyper or hypomethylated tumors and the normal tissue samples, a Wilcoxon rank sum test was used to test the null hypothesis that overall gene expression in hyper or hypomethylated tumor samples was different from that in normal samples (adj. p value <0.05 (parameter, adj.pval.cutoff); genes with expression levels higher in normal samples than in tumor samples were selected for categories of enhancer probe:gene links, E N :G + or E T :G -, and genes with expression levels higher in tumor samples than in normal samples were selected for categories of enhancer probe:gene links, E N :Gor E T :G + ; genes with low expression levels across samples (mean log2(RSEM+1) equal to 0) were removed. To select the enhancer probe and gene links found with a very substantial degree of change, only those enhancer probe-gene links having at least one tumor with a β>0.6 for hypermeth (parameter, hyper.stringency) and a β<0.4 for hypometh (parameter, hypo.stringency) were chosen for this study.
NOTE: Different cut-offs may work better for your own datasets (e.g. due to purity). Each parameter can be altered as the users prefer. We recommend users to try different cut-offs and confirm the relationship by visualizing in Step 5.

Step 5: Summarization and visualization of TENET results
To summarize the TENET results, all links including genomic coordinates of enhancer probes and TSS of associated genes, as well as p values, are written in *anno.txt files, and list of enhancer probes and genes are saved in *list.txt files.
To identify key transcription factors (using a 1,982 human transcription factor annotation from the ELMER package [1]), known tumor suppressors (using a set of 637 protein coding tumor suppressor genes from the TSGene resource [5], and known oncogenes (using a set of 537 known cancer genes from The Cancer Gene Census [6] for each tumor type, enhancer/gene/TF/known oncogene/known tumor suppressor gene frequency tables (*freq.txt, *table.txt) and histograms (*hist.pdf, *barplot.pdf) are generated (e.g. parameter, hypoGposHistogram).
Enhancer probe to gene link states of each sample is listed in *states.table.txt files as a binary format (enhancer gene links status; yes=1, no=0) (e.g. parameter, hypoGposStates). The status of each individual enhancer probe:gene link in each case sample (i.e. tumor, in this study) is annotated using a hypomethylation cut-off and a hypermethylation cut-off for DNA methylation and a mean of each gene expression level in case samples (E N :G + and E T :G -: lower than mean expression level of gene in case samples, E T :G + and E N :G -: higher than mean expression level of gene in case samples).
For visualization, scatterplots between DNA methylation and gene expression levels (e.g. parameter, hypoGposScatter), circosplots between enhancers and genes (e.g. parameter, makeCircos4gene), and genome browser tracks are made (e.g. parameter, hypoGposTracks). In the scatterplots, empirical p values measured in step 3 and Spearman's correlation coefficients between DNA methylation and gene expression of each link are included for users to help determine TENET program's parameters of their datasets. In order to confirm that the enhancer probe:gene links identified by TENET were not confounded by other factors, complex scatterplots can be made with additional information of tumor samples. If tumor purity estimates, copy number variation (CNV), and somatic mutation (SM) datasets are available, complex scatterplots between DNA methylation and gene expression levels showing tumor purity as dot size and CNV and SM as dot shape (see Additional file 1: Figure S2b) can be generated (e.g. parameter, hypoGposCScatter). For PRAD TENET results, average DNA-based purity estimates and CNV and SM data were downloaded from the TCGA Firebrowse portal [2] and included to confirm that the enhancer probe-gene links identified by TENET were not confounded by other factors (Additional file 1: Figure S2). NOTE: TENET is designed for users that each step can be run independently, so users can switch order of steps to run as they would like. For enhancer:gene links identification, we recommend users to run all of steps for linking (i.e. step 2, 3, and 4), reducing potential statistical bias; especially, when the parameter, minTumor is set with a very small number. Figure S1. Workflow of TENET. Starting from obtaining datasets (step 0), TENET identifies differentially methylated enhancer regions (step 1), and selects enhancer-gene links using statistical methods (step2 through 4). The enhancer-gene links found by TENET can be visualized by making tables, histograms, scatterplots, circos plots, and genome browser tracks (step 5).  To further visually validate that the enhancer probe:gene expression relationship is not confounded by tumor purity, the average DNA-based tumor purity estimates, measured by ABSOLUTE [7] and CLONET [8], are indicated by the size of each dot in the scatter plots in panels A and B. Additionally, somatic mutation information measured by MutSig2CV [9], and copy number variation of each gene analyzed by GISTIC2 calls [10] are reflected as dot shapes to investigate any effects of genetic alteration on gene expression [2]. Z score was calculated between an enhancer to all genes (e.g. n=20,531 for TCGA Level 3 RNA-seq datasets) in step 2 (top right). Using z scores, empirical p values were permutated between the gene selected in step 2 to all other hypo or hypermethylated enhancers (e.g. n=6,251 for hypomethylated enhancer probes in PRAD) in step 3 (bottom right). (c) To further optimize selection of enhancer:gene links, gene expression levels were compared between tumor and normal samples in step 4.

Link
Enhancer Gene