Skip to main content

Table 1 Computational methods for scoring genome wide DNA methylation heterogeneity

From: Estimating genome-wide DNA methylation heterogeneity with methylation patterns

Method

Formula

 

Approach

Applicable to non-CG sites

Consideration of pattern similarity

Linearity of the score

Independent of methylation level1

Genome-wide screening

Model-based (MeH)

Abundance based

\({\left(\sum_{i=1}^{R}{a}_{i}^{2}\right)}^{-1}\in \left\{1,\dots ,{2}^{w}\right\}\)

\(a:\) methylation patterns

Counting distinct methylation patterns

 

Pairwise-similarity based

\({\left(\sum_{i=1}^{R}\sum_{j=1}^{R}{{d}_{ij}p}_{ij}^{2}\right)}^{-1/2}>0\)

\(p:\) methylation patterns

Considering pairwise similarity between patterns

Phylogenetic-tree based

\({\left(\sum_{i=1}^{B}{L}_{i}{a}_{i}^{2}\right)}^{-1}>0\).5

\(a:\) methylation patterns

Considering the total similarity among all patterns

Other methods

Methylation-concurrence [18, 22]

\(\frac{\sum_{c=1}^{C}{\omega }_{c}}{\sum_{c=1}^{C}{\omega }_{c}+ \sum_{m=1}^{M}{\omega }_{m}+ \sum_{u=1}^{U}{\omega }_{u}}\in [\mathrm{0,1})\)

\(\omega :\) reads covering CG sites

Measuring the methylation concurrence between patterns

   

 

Proportion of Discordant Reads (PDR) [18, 20]

\(\frac{{\sum }_{r\in {R}_{c}}I(\exists i,j \epsilon r \mathrm{s}.\mathrm{t}. {x}_{j,r}\ne {x}_{i,r})}{|{R}_{c}|}\in [\mathrm{0,1}]\)

\(r:\) reads covering CG sites

Counting distinct methylation patterns among reads

   

Methylation entropy [17, 18, 23, 24]

\(\frac{1}{w}\sum_{k}-{a}_{k}{log}_{2}{a}_{k}\in [\mathrm{0,1}]\)

\(a:\) methylation patterns

Measuring the chaos among the reads of different methylation patterns

   

Epipolymorphism [18, 19]

\(1-\sum_{k}{a}_{k}^{2}\in [\mathrm{0,1})\)

\(a:\) methylation patterns

Estimating the probability of observing two different patterns at random

   

Fraction of Discordant Read Pairs (FDRP) [18]

\(\frac{{\sum }_{{r}_{s}\in {R}_{c}}{\sum }_{{r}_{t}\in {R}_{c},t>s}I(\exists i\in \left\{{r}_{s}\cap {r}_{t}\right\} \mathrm{s}.\mathrm{t}. {x}_{i,{r}_{s}}\ne {x}_{i,{r}_{t}})}{(\genfrac{}{}{0pt}{}{|{R}_{c}|}{2})}\in [\mathrm{0,1}]\)

\(r:\) reads covering CG sites

Calculating pairwise disagreement of between any two reads

   

Quantitative FDRP (qFDRP)[18]

\(\frac{{\sum }_{{r}_{s}\in {R}_{c}}{\sum }_{{r}_{t}\in {R}_{c},t>s}\frac{{\sum }_{i\in {\{r}_{s}\cap {r}_{t}\}}I({x}_{i,{r}_{s}}\ne {x}_{i,{r}_{t}})}{|\{{r}_{s}\cap {r}_{t}\}|}}{(\genfrac{}{}{0pt}{}{|{R}_{c}|}{2})}\in [\mathrm{0,1}]\)

\(r:\) reads covering CG sites

Quantifying the similarity of paired-methyl reads by Hamming distance

 

  

Methylation Haplotype Load (MHL) [18, 21]

\(\frac{{\sum }_{l=0}^{L}(l+1)\frac{{\sum }_{r\in {R}_{c}}{\sum }_{i=1}^{\left|r\right|-l}I({x}_{i,r}=1\wedge \dots \wedge {x}_{i+l.r}=1)}{{\sum }_{r\in {R}_{c}}\left|r\right|-l}}{{\sum }_{l=0}^{L}l+1}\in [\mathrm{0,1}]\)

\(r:\) reads covering CG sites

Estimating the fraction of strings that are fully methylated for all possible lengths

 

  

  1. 1See method description above (by the formula, and the designing principle and the literature)