Studies of DNA methylation contribute to the identification of epigenetic factors that influence gene expression and susceptibility to complex disease traits. With DNA sequence variants unable to account for all of the heritability of disease, behaviors, and other multifactorial phenotypes, such studies will likely uncover information on the heritability gap left from conventional GWASs. Moreover, DNA methylation offers a natural interface for understanding the interactive effects of environmental and genetic factors (“nature” vs “nurture”) in relation to disease onset and progression.
ASM has typically been associated with genomic imprinting with studies of mouse models determining that ASM loci are far more widespread and exist with variable influence from the underlying genotype. As the patterns of ASM will vary significantly between species, studies must be performed in humans to understand disease and the impact of chemical and physical agents on their etiology [13]. Research designs that use genetically isolated human populations can be useful due to their restricted genetic and environmental variation and the presence of extended pedigrees for tracking inheritance. This potentially enriches ASM signal making it easier to detect their influence with small sample sizes.
Studying genome-wide patterns of ASM in humans has recently become more feasible with the advent of NGS bisulphite sequencing technologies. Whilst whole genome bisulphite sequencing (WGBS) remains prohibitively expensive, genome-wide bisulphite-sequencing using capture-based technology can characterise ASM for a substantial proportion of the methylome at reasonable cost. In this study, we applied this approach to a third-generation nuclear pedigree from the NI population (n = 24) and used genotype-independent methods to detect specific ASM loci and ASM regions (AMRs). Importantly, these AMRs tagged 79% of known imprinting regions, indicating the reliability of the methods. Large AMRs (> 200 CpGs and > 10 Kb in size) mapped primarily to Procadherin and HOX gene clusters suggesting epigenetic regulation at these transcriptionally complex regions. Protocadherin ASM has been identified in bees [14] and humans [15], and mono-allelic expression has been documented across this gene cluster previously [2, 16, 17]. In addition to the large HOXA AMR that we detected, ASM at some HOXD cluster genes (HOXD3 and HOXD4) has been previously reported in blood and different regions of the brain [18]. While a large proportion of ASM appears to be conserved across tissues, tissue-specific ASM patterns have also been detected [18]. It is plausible that ASM can affect genomic control across complex regions such as Protocadherin and HOX, because different isoforms expressed in different cells/tissues having ASM allows for further fine tuning of these systems, potentially providing an evolutionary advantage [19].
When we tested for over-representation of AMRs with respect to annotated gene and regulatory features, we found that AMRs were highly enriched at non-coding RNA loci. At the genome-wide level, enrichment of ASM at ncRNAs and miRNAs has not been previously reported in humans, although it is known that imprinted gene domains also transcribe hundreds of miRNA, small nucleolar RNA genes (snoRNAs) [20]. Furthermore, ASM of specific non-coding genes in imprinted regions have been reported. For example, down-regulation of MEG3, a microRNA gene, by parent-specific methylation has been observed in pancreatic islets of Type 2 diabetes mellitus patients, thus providing mechanistic support for co-localisation of non-coding RNA and ASM [21].
Chromatin sites were also enriched for AMRs. Nag et al. reported a chromatin signature for monoallelically expressed genes consisting of chromatin marks associated with active transcription (H3K36me3) and silencing (H3K27me3) simultaneously occurring [22]. Similarly, we find ASM regions are enriched in histone modifications, which have been associated with both gene activity (e.g. H3K4 trimethylation, H3K9 acetylation. H3K27 acetylation, and monomethylation of H2BK5 and K4K20) and repression (e.g., trimethylation of H3K9, H3K27, and H3K79) [23]. However, the coincidence of both active H3K27 acetylation and repressive DNA methylation marks has been found at some enhancers where these bivalent regions are stabilised by, and may require DNA methylation, to potentially remain active [24].
Interestingly, there was enrichment of AMRs at subtelomeric regions, an observation not previously mentioned in other studies of AMR [15]. The subtelomeric region is important for the process of homologous chromosome recognition and pairing. The functional relevance of ASM at these genomic regions is not clear, but Law et al. showed that the X-linked gene for ATRX syndrome binds to tandem repeats at the subtelomeric regions and is involved in allele-specific expression of genes in that region [25]. It may be useful to focus larger studies on co-association of tandem repeats and ASM at sub-telomeric regions. A caveat of our observation of enrichment at sub-telomeric regions is that these chromosomal locations contain CpG dense sequences [26]. Therefore, algorithms, such as the one used here to identify ASM, which model based on adjacent CpGs may introduce a bias towards CpG dense regions; this bias would potentially affect all CpG dense regions, not just those located near the telomere. With respect to a technical bias towards enrichment of sub-telomeric regions, this may be mitigated by the capture technology which is based on gene-rich loci targeted by the Illumina 450K DNA methylation array and sub-telomeric regions are relatively gene poor [26]. It is also worth noting that DNA methylation changes within sub-telomeric regions are increasingly associated with human disease [27].
A total of 73 extended AMRs were identified to be devoid of common SNVs (MAF > 0.01) indicating non-genetic inheritance at these loci, or at least cis-acting regulation. The ASM calling approach could be considered a limitation as it is an estimation-based method. Other methods that call ASM using genomic variation (heterozygous SNVs in the reads) will likely be more accurate at directly assigning ASM, one such method was very recently published [28]. This, however, ties the calling of ASM inherently to genomic variation, whereas the method which we implemented in this study is independent of genotype, thus able to potentially identify additional regions of ASM such as that noted on chromosome 21, which was recently found to be maternally imprinted [29]. A logical step to assess the influence of genotype on methylation would be to perform mQTL analysis; however, the current sample size is too small. Tycko et al. discuss the significance of using mQTLs and ASM in combination with GWAS studies to identify disease-associated regulatory sequences [30]. They also describe various cis-acting mechanisms that lead to mQTLs and ASM, such as allele-specific histone modifications and eQTLs, from which we can learn more about the biological functions of these DNA regulatory variants, transcription factors, and pathways that interact with them.
The use of a genome-wide bisulphite sequencing approach (as opposed to WGBS) means that we could not represent all AMRs (i.e., if the AMR extends beyond the capture region, we are unable to detect this). However, to account for bias in enrichment approach, we provided the GAT software with a design file (padded 1000 bp either side of each capture region) as the search space, meaning that enrichment is restricted to the captured region (and not the whole genome). Future studies of WGBS and the use of larger extended pedigrees may provide a more accurate and comprehensive ASM map and relationship profile with regulatory regions. A study by Zink et al. has used the pedigree structure of the Icelandic population to assign parent-of-origin to transmitted alleles and methylation levels across the genome. With these data, they were able to provide new insights into imprinted regions with high-resolution maps across key regions, e.g., the Angelman syndrome/Prader-Willi locus on 15q11.2 [31].
In summary, this study has used genome-wide bisulphite sequencing to map ASM in a multi-generational pedigree from the NI genetic isolate. We have confirmed that ASM regions are widespread and extend far beyond known genomic imprinting loci in humans. Importantly, our results show that AMRs are highly enriched in non-coding genomic regions providing evidence for an integral role in gene regulatory networks. Moving research from mouse models to humans is crucial to further understand the complex interplay of allele-specific methylation and gene expression. Studies of pedigrees within genetically and environmentally limited isolates are a natural extension to mouse models. The knowledge gained from such experiments will advance our fundamental understanding of the complex patterns of epigenetic regulation in human populations, aiding in our further understanding of the epigenetic basis of complex traits and diseases.