The SMART App: an interactive web application for comprehensive DNA methylation analysis and visualization

Background Data mining of The Cancer Genome Atlas (TCGA) data has significantly facilitated cancer genome research and provided unprecedented opportunities for cancer researchers. However, existing web applications for DNA methylation analysis does not adequately address the need of experimental biologists, and many additional functions are often required. Results To facilitate DNA methylation analysis, we present the SMART (Shiny Methylation Analysis Resource Tool) App, a user-friendly and easy-to-use web application for comprehensively analyzing the DNA methylation data of TCGA project. The SMART App integrates multi-omics and clinical data with DNA methylation and provides key interactive and customized functions including CpG visualization, pan-cancer methylation profile, differential methylation analysis, correlation analysis and survival analysis for users to analyze the DNA methylation in diverse cancer types in a multi-dimensional manner. Conclusion The SMART App serves as a new approach for users, especially wet-bench scientists with no programming background, to analyze the scientific big data and facilitate data mining. The SMART App is available at http://www.bioinfo-zs.com/smartapp.


Introduction
All cancers arise as a result of the accumulation of somatic mutations, copy number alterations, and epigenetic modifications that alter transcription and protein expression. Thus, studies of molecular features such as DNA methylation may reveal the underlying mechanisms of carcinogenesis and progression. DNA methylation, the addition of a methyl group to DNA, plays a critical role in regulating gene expression [1]. It has been reported that DNA methylation at the promoter regions is often negatively correlated with gene expression while DNA methylation in gene bodies is often positively correlated with gene expression [2]. Abnormal DNA methylation patterns are found in every type of human cancer [3]. Many previous studies have shown that DNA methylation is involved in many aspects of carcinogenesis and provides potential biomarkers for evaluating the diagnosis and prognosis of cancer [4][5][6]. A recent study has also shown the association between DNA methylation and somatic copy number aberration, suggesting a much more complex mechanism beyond this modification [7].
The Cancer Genome Atlas (TCGA), a project supported by the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), hosts tremendous amount of multi-omics data and allows systematic study of the genetic or epigenetic basis of cancer [8]. However, accessing and analyzing the DNA methylation data from TCGA database is quite difficult for those scientists who have no computational background. Therefore, constructing easy-to-use applications for analyzing the DNA methylation data of TCGA database is demanded. MethHC (http://methh c.mbc.nctu.edu.tw), Wanderer (http://mapla b.imppc .org/wande rer/), MEXPRESS (https ://mexpr ess.be), and MethSurv (https ://biit.cs.ut.ee/meths urv/) are examples of web-based tools that allow researchers to integrate, analyze, and visualize DNA methylation [9][10][11][12]. MethHC enables users to browse the top 250 hyper-or hypo-methylated genes in 18 cancer types. Wanderer allows users to analyze DNA methylation and gene expression in a regional framework, MEXPRESS allows users to look at DNA methylation data in relation to its genomic location, and MethSurv can associate overall cancer survival with DNA methylation levels across a large body of TCGA data and many cancers. Although these tools are exceptionally valuable, they do not fully unlock the potential of the publicly available data. For example, they do not offer a function for users to explore the correlation between DNA methylation and transcript expression. In addition, none of the above tools help users visualize the chromosomal distribution of differential methylated CpGs in diverse cancer types. Therefore, we developed the SMART App, which enables users to analyze DNA methylation and its association with other omics data. The SMART App can facilitate DNA methylation data mining and help reveal the complexity of epigenetic modifications.

Features
The SMART App offers interactive functions for users to analyze the DNA methylation in diverse cancer types in a multi-dimensional way.

Home
The home page first displays the number of DNA methylation samples available from TCGA project, colored by sample types (i.e., Normal and Tumor), for users to gain an overview of the sample size of the cancer type of interest. Next, the SMART App provides a quick search interface. Users can enter a gene symbol (e.g., ERBB2) into the 'Quick start' box to search for a gene of interest. By clicking the "Go" button, a circular plot showing the chromosomal distribution of all associated CpGs of the input gene will be displayed. To help users gain more useful information about the CpGs and their genomic locations along with transcripts, a detailed segment plot highlighting the transcripts, exons, UTR, CDS, CpG island regions, shelves and shores is displayed below (Fig. 1). This segment plot can help researchers to identify potential methylationexpression related CpGs. The panel below summarizes the detailed information these probes, and users can select one of these probes to view its pan-cancer methylation profile and identify aberrantly methylated sites for further analysis. Besides, users can also view the CpG-aggregated pancancer methylation profile. Users can select multiple CpGs at a time to explore the mean or median methylation of the selected CpGs. We previously identified that TRIM58 is a novel prognostic-related methylation-driven gene in lung squamous cell carcinoma [13]. Using the quick search function of the SMART App, it is easy to find that mean methylation level of TRIM58 is significantly higher not only in lung squamous cell carcinoma but also in many other cancer types including breast cancer, head and neck carcinoma, and lung adenocarcinoma, indicating its potential role in carcinogenesis in these cancer types.

Differential CpGs
Differential analysis is a common approach in cancer research by comparing tumor samples vs. normal samples for identifying aberrantly methylated CpGs. Meanwhile, clustering of the CpGs with similar methylation patterns along the chromosomes may reflect the genomic mechanisms leading to specific methylation characteristics [14]. Therefore, the SMART App allows users to set custom cut-off values for a given cancer type to dynamically obtain differentially methylated CpGs and their chromosomal distributions (Fig. 2). The delta |Beta-value|/delta |M value| of each probe is calculated as the mean Beta-value/M value in tumor samples minus the mean Beta-value/M value in normal samples. p value is calculated using the Wilcoxon rank sum test, and is adjusted using the Benjamini-Hochberg method. Moreover, for users who only want to visualize specific CpGs, the SMART App offers an extra function that allows users to draw CpG flexibly. The detailed description can be found at the website.

Methylation DIY
This module provides functions for users to comprehensively analyze DNA methylation taking other omics data and clinical stages into consideration. The first panel generates custom box plots for users for compare CpGs of genes between normal and tumor samples in a given cancer type. Users can select multiple probes at   [15]. When IDH1 is selected, the returned box plots showed that cg07640666, cg17353896 and cg24324379 were significantly hyper-methylated in the mutation group (Fig. 3a, p value < 0.05). Sun et al. observed the correlation between CNV and methylation and discussed the possible mechanisms relating to this event [7]. Here, the SMART App provides a function for researchers to study the possible association between CNV and DNA methylation. The results are displayed as box plots showing the correlation between CNV and methylation. With the SMART App, it is very interesting to observe that TRIM58 (cg04902327) shows a lower level of methylation with low-level copy number amplification, whereas other CpGs of TRIM58 show a positive correlation with CNV in lung squamous cell carcinoma (Fig. 3c, p value < 0.05).

Correlation
DNA methylation is often correlated with gene expression. The correlation function of SMART App performs correlation analysis between gene expression and methylation for any given sets of TCGA, using methods including Pearson, Spearman, and Kendall correlation statistics. The UCSC Xena provides the re-computed expression data of TCGA for 198,619 transcripts. Accordingly, there are two levels available, and one can choose to analyze the correlation at gene level or transcript level. When analyzing the correlation at transcript level, a segment plot highlighting the genomic locations of the transcript and CpGs will be displayed, and the distances of each probe to TSS will also be shown in the table below for users to locate the ones at the promoter region. The results are displayed as scatter and distribution plots ( Fig. 4 and Additional file 1: Figure S1).

Survival
The SMART App performs overall survival (OS) and disease-free internal (DFI)-related survival analysis based on methylation levels. This function allows users to select their custom cancer types for overall or diseasefree survival analysis. Cox regression analysis is a popular method for evaluating the prognostic value of individual variables. To efficiently analyze the survival significance of methylation, the SMART App offers both univariate and multivariate Cox regression analyses. When performing multivariate Cox regression analysis, users can adjust for potential confounding factors, including age, gender, race and pathological stage. Users can copy and paste a list of CpGs into the box, and select the cancer type of interest to conduct Cox regression analysis. The hazard ratio, 95% confidence interval, z score, and p value will be given. Once users have identified the significant variables, they can use the SMART App to draw survival curves. The thresholds for high/low methylation level cohorts can be adjusted by users.

Comparison with existing tools
Web tools to analyze DNA methylation of TCGA project include methHC, Wanderer, MEXPRESS, and Meth-Surv. MethHC was introduced in 2014 and enables users to identify highest/lowest methylated genes, perform hierarchical cluster analysis, explore methylation profile across tumors and conduct correlation analysis. However, the latest update of methHC was in 2014. Wanderer is an interactive web application to explore DNA methylation and gene expression. It provides a singlepage interface to explore DNA methylation in a regional framework. MEXPRESS is a data visualization tool for DNA methylation analysis and was first introduced in 2015. Now, it has been updated, adding more data and generating fancier figures. MethSurv is a shiny application that mainly focuses on the clinical impacts of DNA methylation. While these tools are extraordinarily valuable, many extra functions are not adequately addressed by them. M value has been reported to be more statistically valid for the differential analysis [16]. Although differential analyses are commonly performed by these tools, none of them allow users to use the M value for differential analysis. None of these tools allow users to pick a cancer type and visualize the chromosomal distribution of the aberrantly methylated CpGs. In addition, none of the existing tools allow users to analyze the correlation between methylation and expression at transcript level. Besides, none of the tools provide customizable selection of methylation thresholds for patient cohort partitioning in survival curves plotting. A detailed comparison is shown in Table 1.

Discussion
The SMART App is an interactive web application for DNA methylation analysis based on the TCGA database. The SMART App enables experimental biologists without any computational programming background to perform various analyses relating to DNA methylation in diverse cancer types. Using the SMART App, one can easily explore the large DNA methylation data, ask specific scientific questions, and validate their findings. For example, one can easily find that CpGs such as Li et al. Epigenetics & Chromatin (2019) 12:71 cg10983544 and cg20429172 are located at the promoter region of the transcript of TRIM58, and may ask whether these CpGs are aberrantly methylated and whether the methylation changes of these CpGs will lead to gene expression alterations. One can also identify significantly hyper-and hypo-methylated CpG-based custom thresholds. Moreover, one can explore the correlation between methylation and other omics and clinical data, analyze the prognostic value of CpGs and draw survival curves. Meanwhile, the flexible customization parameters of the SMART App also enable users to customize the result visualization. The SMART App is a user-friendly and intuitive tool for unlocking the potential value of the genomic data in TCGA. It complements well with other available tools.

Conclusion
The SMART App is a web-based tool to explore and interpret the DNA methylation data across 33 cancer types from TCGA database. The source code of the Fig. 4 Spearman correlation between expression (ZNF582) and DNA methylation (M value) in lung squamous cell carcinoma. a Gene-level correlation showing that the expression of ZNF582 is significantly negatively associated with the methylation of cg24733179, cg11740878, cg09568464, cg02763101, cg22647407, cg08464824, cg13916740, cg24039631, cg20984085, and cg25267765. b Transcript-level correlation showing that the expression of ENST00000301310.8 is significantly negatively associated with the methylation of cg24733179, cg11740878, cg09568464, cg02763101, cg22647407, cg08464824, cg13916740, cg24039631, cg20984085, and cg25267765 SMART App is available for users to download under GPLv3 license.

Methods
The SMART App is developed entirely in the R programming language using the Shiny framework and is freely available for all users. There is no login requirement for accessing any features in the SMART App. The SMART App has been most extensively tested in a Safari browser environment and is also compatible with other popular web browsers such as Chrome, Firefox, and Internet Explorer.
Both Beta-value and M value are commonly used in DNA methylation analysis. The M value has been reported to have a more dynamic range, and is more appropriate for statistical analysis [16]. Whereas the Beta-value is much more biologically interpretable. Therefore, these two types of methylation values are available in the SMART App.
The SMART App outputs consist of figures and tables, which are available for users to download. Figures are rendered as Portable Document Format (PDF), which can be further edited using Adobe Illustrator. Tables are generated by DT R package (https :// rstud io.githu b.io/DT/) allowing for data querying and selection.