Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation

Background X-chromosome inactivation (XCI) is the epigenetic inactivation of one of two X chromosomes in XX eutherian mammals. The inactive X chromosome is the result of multiple silencing pathways that act in concert to deposit chromatin changes, including DNA methylation and histone modifications. Yet over 15% of genes escape or variably escape from inactivation and continue to be expressed from the otherwise inactive X chromosome. To the extent that they have been studied, epigenetic marks correlate with this expression. Results Using publicly available data, we compared XCI status calls with DNA methylation, H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3 and H3K36me3. At genes subject to XCI we found heterochromatic marks enriched, and euchromatic marks depleted on the inactive X when compared to the active X. Genes escaping XCI were more similar between the active and inactive X. Using sample-specific XCI status calls, we found some marks differed significantly with variable XCI status, but which marks were significant was not consistent between genes. A model trained to predict XCI status from these epigenetic marks obtained over 75% accuracy for genes escaping and over 90% for genes subject to XCI. This model made novel XCI status calls for genes without allelic differences or CpG islands required for other methods. Examining these calls across a domain of variably escaping genes, we saw XCI status vary across individual genes rather than at the domain level. Lastly, we compared XCI status calls to genetic polymorphisms, finding multiple loci associated with XCI status changes at variably escaping genes, but none individually sufficient to induce an XCI status change. Conclusion The control of expression from the inactive X chromosome is multifaceted, but ultimately regulated at the individual gene level with detectable but limited impact of distant polymorphisms. On the inactive X, at silenced genes euchromatic marks are depleted while heterochromatic marks are enriched. Genes escaping inactivation show a less significant enrichment of heterochromatic marks and depletion of H3K27ac. Combining all examined marks improved XCI status prediction, particularly for genes without CpG islands or polymorphisms, as no single feature is a consistent feature of silenced or expressed genes. Supplementary Information The online version contains supplementary material available at 10.1186/s13072-021-00404-9.


ENCODE
: The ratio of TSSs with significant differences between males and females for various epigenetic marks using CEMT data. The denominator was the total number of informative TSSs for which we had data. For most marks this was measured as 500bp upstream of the promoter, but for H3K36me3 we measured the mark across exons. For H3K36me3 we used unique transcripts instead of unique TSSs. Marks significant in over 70% of informative TSSs are in bold. All of the H3K27me3 data from ENCODE was downloaded and used as a replication dataset. Chromosome 7 (chr7) was included as an example autosome.           Table S12: The percent of genes found variably escaping by our epigenetic predictor with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as escaping from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably escaping genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X chromosome with differences between tissues/TSSs. Highlighted in blue are marks which were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chisquare adjusted p-value<0.01).    Table S15: The percent of genes found variably escaping by our epigenetic predictor in the CREST dataset with significant differences in various epigenetic marks. Genes were counted as significant if BH corrected p-values were less than 0.01 when using t tests to compare samples predicted as subject to XCI to samples predicted as escaping from XCI. The total number of genes row shows the total number of genes in each category. The variable escape across tissues and TSSs categories have 2 columns each, the left column being the percent of variably escaping genes with significant differences between tissues/TSSs and the right column being the percent of all genes on the X chromosome with differences between tissues/TSSs. Highlighted in blue are marks which were significantly more likely to have significant differences between tissues/TSSs at genes predicted to variably escape than in all X linked genes (Chi-square adjusted p-value<0.01). There are separate sheets for association with Xi/Xa and 450k based XCI status calls, and for comparing to all chromosomes, and just chromosome X. The adjusted p-value is calculated using the Benjamini-Hochberg method. For the sheet associating DNAme based XCI status calls with loci on all chromosomes, we included all 610 significant loci instead of the top 100. I have also included the amount of samples with each XCI status (E for escapes XCI, S for subject to XCI) and each genotype (ref for reference allele, het for heterozygous, alt for alternate allele) (columns E-J). Columns M-N are the ratio of reference to alternate alleles at samples escaping or subject to XCI, with O being the ratio of these two columns and P being the reciprocal of O if it is less than 1, to make comparison easier. This enrichment column (col P) shows enrichment of reference allele at samples with one XCI status over the other. For the DNAme allChr sheet we have also included a column showing the attributable risk per allele.
Table S17: The number of loci associated with each gene and genes associated with each locus. See additional files. These are for the association between DNAme based XCI status and genetic polymorphisms. See additional files. These loci were independently tested as DNAmeQTLs in females and males, with some columns color coded based on sex (pink female, light blue male). There are also columns with the median and mean DNAme value at the gene's island for samples with the reference or alternate allele at that loci; these columns are color coded based on whether the allele is in the range to escape from XCI (DNAme<0.01, blue) or in the range to be subject to XCI (DNAme>0.15, orange). There are mean and median columns for both males and females, but only the female columns are color coded based on XCI status. There are boxes around the genes with female median values with one allele in the range to escape XCI and the other allele in the range to be subject to XCI.

Figure S5
: Differences in epigenetic marks between samples found escaping vs subject to XCI at variably escaping genes called using DNAme. For most of these marks, the region 500bp upstream of the promoter is used, except for H3K36me3 which uses the gene body. The median value per gene in samples found subject to XCI was subtracted from the median value per gene in samples which escaped from XCI. This is done here for all genes found variably escaping across individuals by DNAme.  Figure S6: IGV view of DNAme bigwig tracks at two variably escaping genes. a) A view of the CpG island at CITED1. b) a view of the CpG island at NAA10. A broad representation of samples was sought, some hypomethylated, some hypermethylated and some inconsistent across the CpG island. Broad hypermethylation in males at these genes was rare but is included here as an example of an extreme.
Sex: Female Male Figure S7: average DNAme difference between adjacent CpGs per CpG island. Each point is the average DNAme difference between adjacent CpGs for an individual island, averaged again across samples. Islands are colored by the meta-status of the closest TSS within 2kb. Chr7 was chosen as an autosomal control to show whether the differences are X specific. Males and females from CEMT were used to check for sex specificity and females from CREST were included to check for cancer specificity.

Meta-status
Escapes XCI PAR Subject to XCI Variably escapes from XCI No call Figure S8: ROC for predictive models trained with each epigenetic mark. On display is one random forest model trained per sample with one epigenetic mark as its input, along with the median value of the mark in similar males. Samples are colored by tissue. The all category is for a predictor using all 6 histone marks and DNAme. Black diagonal lines were added to ease comparison between figures.
Tissue: Blood Brain Breast Colon Thyroid Figure S9: Accuracy when models trained in one sample are tested on other models. Figure S10: Comparing XIST expression to the number of escape genes predicted per sample.
Predictions were made using a random forest model with all histone marks and DNAme.
Figure S11: Marks which were significantly different between samples predicted as escaping vs subject to XCI in a variably escaping region. Transcript ID is the order that the transcripts are located along the chromosome. There are multiple transcripts per gene but they may be sharing the same TSS and have the same data for all marks but H3K36me3. Vertical lines are drawn denoting which transcripts belong with each gene.