Reference epigenome generation across EC types
To establish an epigenetic catalog for different EC types, we generated a total of 491 genome-wide datasets, consisting of 424 histone modification ChIP-seq and 67 paired-end RNA sequencing (RNA-seq) datasets, encompassing a total of 22.3 billion sequenced reads. ECs were maintained as primary cultures with a physiological concentration of vascular endothelial growth factor (VEGF) and a minimal number of passages (fewer than six). We generated genome-wide normalized coverage tracks and peaks for ChIP-seq data and estimated normalized gene expression values for RNA-seq data.
In this study, we selected a subset of 33 EC samples (131 datasets) as a representative set comprising nine types of vessels from the human body (Fig. 1a):
-
Human aortic endothelial cells (HAoECs),
-
Human coronary artery endothelial cells (HCoAECs),
-
Human endocardial cells (HENDCs),
-
Human pulmonary artery endothelial cells (HPAECs),
-
Human umbilical vein endothelial cells (HUVECs),
-
Human umbilical artery endothelial cells (HUAECs),
-
Human common carotid artery endothelial cells (HCCaECs),
-
Human renal artery endothelial cells (HRAECs), and
-
Human great saphenous vein endothelial cells (HGSVECs).
Details about the 33 EC samples are presented in Additional file 1: Table S1.
Among the structures lined by these nine EC types, a group of two aortic, six common carotid and three coronary arteries is known as the “systemic arteries” and harbors arterial blood with 100 mmHg of oxygen tension and blood pressure in a range from 140 to 60 mmHg. Data sets for each cell type comprise samples from multiple donors, all of which achieved high-quality values as evaluated below. Here, we focused on two histone modifications, H3K4me3 and H3K27ac (Fig. 1b), which are the key markers of active promoters and enhancers [16]. Because both H3K4me3 and H3K27ac exhibit strong, sharp peaks with ChIP-seq analysis, they are more suitable for identifying shared and/or unique features across EC cell types as compared with other histone modifications that show broad peaks, such as H3K9me3.
Quality validation
To evaluate the quality of obtained ChIP-seq data, we computed a variety of quality control measures (Additional file 2: Table S2), including the number of uniquely mapped reads, library complexity (the fraction of nonredundant reads), GC content of mapped reads, genome coverage (the fraction of overlapped genomic areas with at least one mapped read), the number of peaks, signal-to-noise ratio (S/N) based on the normalized strand coefficient [17], read-distribution bias measured by background uniformity [17], inter-sample correlation for each EC type and genome-wide correlation of read density across all-by-all pairs (Additional file 3: Figure S1). In addition, the peak distribution around several known positive/negative marker genes was visually inspected. Low-quality datasets were not used for further analyses.
To further validate the reliability of our data, we evaluated the consistency between the obtained peaks from ChIP-seq and the gene expression values from corresponding RNA-seq data. We applied a bivariate regression model [18] to estimate the expression level of all genes based on H3K4me3 and H3K27ac peaks and then calculated the Pearson correlation between the estimated and the observed expression levels from ChIP-seq and RNA-seq, respectively. We used data derived from IMR90 fibroblasts analyzed with the same antibodies as a negative control, and we confirmed that peak distribution of the ChIP-seq data was highly correlated with corresponding RNA-seq data for ECs, but not with IMR90 data (Fig. 1c, Additional file 3: Figure S2 for the full matrix). Therefore, our ChIP-seq data are likely to represent the histone modification states of ECs for annotation.
Identification of active promoter and enhancer sites
We used H3K4me3 and H3K27ac ChIP-seq peaks to define “active promoter (H3K4me3 and H3K27ac)” and “enhancer (H3K27ac only)” sites for each sample (Fig. 1b, left). Then we assembled them and defined the common sites among all samples of a given EC type as the reference sites, to avoid differences specific to individuals. Finally, the reference sites of all nine EC types were merged into a single reference set for ECs (Fig. 1b, right). We identified 9121 active promoter sites (peak width, 2840.8 bp on average) and 23,202 enhancer sites (peak width, 1799.4 bp on average). The averaged peak width became relatively wide due to the merging of multiple contiguous sites.
We compared the distribution of the reference sites with gene annotation information. As expected, active promoter sites were enriched in the transcription start sites (TSSs) of genes, whereas enhancer sites were more frequently dispersed in introns and intergenic regions (Additional file 3: Figure S3). Among the enhancers, 15,625 (67.3%) were distally located (more than 10 kbp away from the nearest TSSs). The number of enhancer sites was more varied among the nine EC types, whereas the number of active promoter sites was comparable across the EC types (Fig. 2a, upper panel). The large number of HUAEC enhancer sites is possibly due to the small number of samples (two) and the relatively small difference between the individuals (both samples were from newborns). We also evaluated the shared ratio of promoter and enhancer sites across all EC types (Fig. 2a, lower panel). We found that nearly 80% of the active promoter sites were shared among multiple EC types. In contrast, 57.7% of the enhancers were specific to up to two EC types, suggesting that their more diverse distribution across EC types relative to active promoter sites contributes to the EC type-specific regulatory activity. These observations are consistent with previous studies for other cell lines [16, 19].
Evaluation of enhancer sites by PCA
To investigate the diverse distribution of our reference enhancer sites, we used the principal component analysis (PCA) based on the H3K27ac read densities in the integrated EC enhancer sites with the 117 cell lines from the Roadmap Epigenomics Project [19]. We found that ECs were well clustered and separated from other cell lines (Fig. 2b). Remarkably, HUVECs represented in the Roadmap Epigenomics Project dataset, termed E122, were properly included in the EC cluster (red circle). In contrast, IMR90 cells from our study were included in the non-EC cluster (blue circle). This result supported the reliability of our EC-specific enhancer profiling. It should be noted, however, that the samples for each EC cell type (indicated by different colors) were not well clustered, possibly because the EC type-specific difference is minuscule and is overshadowed by differences at the level of the individual.
Identification of enhancer–promoter interactions by ChIA-PET
We sought to identify the corresponding gene for the reference enhancer sites and used chromatin loop data obtained from the Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) data using RNA Polymerase II (Pol II) in HUVECs. We identified 292 significant chromatin loops (false discovery rate [FDR] < 0.05), 49.3% (144 loops) of which connected promoter and enhancer sites. Even when we used all chromatin loops (at least one read pair), 27.4% (8782 of 31,997) of them linked to enhancer–promoter sites. Remarkably, 48.1% (4228 of 8782) of loops connected the distal enhancer sites. In total, we identified 2686 distal enhancer sites that are connected by chromatin loops. We also detected enhancer–enhancer (3136, 9.8%) and promoter–promoter (11,618, 36.3%) loops, suggesting physically aggregated chromatin hubs in which multiple promoters and enhancers interact [20]. As the ChIA-PET data are derived from RNA Pol II-associated loops in HUVECs, chromatin interactions in active genes could be detected.
Identification of EC-specific sites
Next, we identified EC-specific enhancer sites by excluding any sites from our reference sites that overlapped with those of our IMR90 cells and other cell types from the Roadmap Epigenomics Project, except HUVECs (E122). As a result, we obtained 3765 EC-specific enhancer sites (Additional file 4: Table S3), some of which were located around known marker genes of ECs with chromatin loops. One example is kinase insert domain receptor (KDR; Fig. 2c, left), which functions as the VEGF receptor, causing endothelial proliferation, survival, migration, tubular morphogenesis and sprouting [21]. The TSS of KDR was marked as an active promoter (enriched for both H3K4me3 and H3K27ac) and physically interacted with the EC-specific enhancer sites indicated by H3K27ac, ~ 50 kbp upstream and downstream of the TSS. Another example is intercellular adhesion molecule 2 (ICAM2; Fig. 2c, right), which is an endothelial marker and is involved in the binding to white blood cells that occurs during the antigen-specific immune response [22]. This gene has two known TSSs, both of which were annotated as active promoters in ECs and one of which that was EC specific (black arrow). This EC-specific TSS did not have a ChIA-PET interaction, and, likewise, the enhancer sites within the entire gene body did not directly interact with the adjacent promoter sites, implying the distinctive regulation of the two ICAM2 promoters.
Genome-wide association study (GWAS) enrichment analysis
To explore the correlation of EC-specific reference enhancer sites with sequence variants associated with disease phenotypes, we obtained reference GWAS single-nucleotide polymorphisms (SNPs) from the GWAS catalog [23] and identified significantly enriched loci by permutation analysis [24]. Notably, we identified 67 enhancer sites that markedly overlapped with GWAS SNPs associated with “heart”, “coronary” and “cardiac” (Z score > 5.0, Additional file 5: Table S4). The most notable region was around CALCRL and TFPI loci (chr2:188146468–188248446, Fig. 2d). The EC-specific enhancer region in these loci contained four GWAS risk variants (Fig. 2d, red triangles), three of which are associated with coronary artery/heart disease [25, 26]. Another example is the RSPO3 locus (Additional file 3: Figure S4). The upstream distal enhancer regions of that gene contained four GWAS SNPs that are associated with cardiovascular disease and blood pressure [27, 28].
Functional analysis of the reference sites
We next investigated whether any characteristic sequence feature is observed in the EC-specific enhancer sites, as well as in the active promoter and distal enhancer sites; the subgroup “all enhancers” was omitted because of its close similarity with the “distal enhancer” subgroup. We found many putative motifs in EC-specific enhancers and fewer motifs in the active promoter and distal enhancer subgroups. Some of these motifs had candidate transcription factors (TFs) assigned in the JASPAR database (Additional file 3: Figure S5). Of note, EC-specific enhancer sites had motifs similar to those in the homeobox genes bcd, oc, Gsc and PITX1-3 (Fig. 3), suggesting their involvement in orchestrating EC-specific gene expression. In fact, because most of the EC-specific enhancer sites consisted of enhancers in HGSVECs (47.0%), HRAECs (37.4%) and HUAECs (68.3%) (Additional file 3: Figure S6), the identified motifs might be involved mainly in the function of these EC types. Active promoter and distal enhancer sites had fewer candidate TFs as compared with EC-specific enhancer sites, possibly because they contain sites corresponding to more common genes (e.g., housekeeping genes). We found that active promoters (and EC-specific enhancers) have a motif similar to the canonical motif of the ETS2 repressor factor ERF (Additional file 3: Figure S5), which is consistent with previous studies [8, 13, 14] that reported that the ETS family motif is enriched in enhancers of several EC types.
We also looked into the Gene Ontology (GO) classifications under Biological Process for the enhancer sites using GREAT [29] and found that the enhancer sites (both all sites and EC-specific sites) have GO terms that are more specific to the vascular system (e.g., platelet activation, myeloid leukocyte activation and vasculogenesis), as compared with active promoter sites (e.g., mRNA catabolic process, Additional file 3: Figure S7). This also suggests that the enhancer sites are more likely to be associated with EC-specific functions, whereas promoter sites are also correlated with the more common biological functions.
Differential analysis and clustering across EC types
One important issue of this study is to clarify the epigenomic/transcriptomic diversity across EC cell types. To circumvent variances at the level of the individual in each cell type observed (Fig. 2c) and different S/N ratios, we fitted the value of peak intensity on the reference enhancer sites among samples using generalized linear models with the quantile normalization. By implementing a PCA, we confirmed that different cell samples in the same EC type were properly clustered (Fig. 4a). The PCA also showed that different EC types can be divided into two subgroups based on the epigenomic landscape, corresponding to upper-body (HAoEC, HCoAEC, HPAEC, HCCaEC and HENDC, purple circle) and lower-body (HUVEC, HUAEC, HGSVEC and HRAEC) origins. A PCA based on gene expression data showed similar results to that based on the H3K27ac profile, although in the gene expression analysis HUAECs were more similar to heart ECs (Additional file 3: Figure S8).
To further investigate this tendency, we implemented a multiple-group differential analysis with respect to H3K4me3, H3K27ac and gene expression data to obtain sites and genes whose values varied significantly between any of the nine cell types. With the threshold FDR < 1e−5, we identified 753 differential H3K4me3 sites (differential promoters, DPs; 8.3% from 9121 active promoter sites), 2979 differential H3K27ac sites (differential enhancers, DEs; 9.2% from 32,323 active promoter and enhancer sites) and 879 differentially expressed genes (DEGs; 2.1% from 41,880 genes). As expected, DPs and DEs were more enriched around DEGs, as compared with all genes. DPs were enriched within ~ 10 kbp from TSSs, whereas DEs were more broadly distributed (~ 100 kbp) (Additional file 3: Figure S9), indicative of the longer-range interactions between enhancers and their corresponding genes.
We then implemented k-means clustering (k = 6) to characterize the overall variability of DEGs, DEs (Fig. 4b) and DPs (Additional file 3: Figure S10). The clustering results are also summarized in Additional file 6: Table S5, Additional file 7: Table S6 and Additional file 8: Table S7. Although k = 6 was empirically defined and might not be biologically optimal to classify the nine EC types, the results did capture differential patterns among them. The upregulated genes were roughly categorized into upper- and lower-body-specific EC types (Fig. 4b), even though diverse expression patterns were observed overall. In particular, the expression patterns of the EC types around the heart (HCoAEC, HAoEC and HPAEC) were similar (cluster 3 of DP and DEG), consistent with the anatomical proximity of these ECs. HENDCs had uniquely expressed genes (cluster 6 of DEGs). Considering that most of the DEGs and DEs are cooperatively enriched in more than one EC type, these nine EC types may use distinct combinations of multiple genes, rather than exclusively expressed individual genes, for their specific phenotype.
DEGs that contribute to EC functions
Our clustering analysis identified important genes for EC functions as DEGs (Fig. 4b). For example, heart and neural crest derivatives expressed 2 (HAND2) and GATA-binding protein 4 (GATA4) were expressed in HAoECs, HENDCs and HPAECs (cluster 3). HAND2 physically interacts with GATA4 and the histone acetyltransferase p300 to form the enhanceosome complex, which regulates tissue-specific gene expression in the heart [30]. Another example is hes related family bHLH transcription factor with YRPW motif 2 (HEY2, also called Hrt2), a positive marker for arterial EC specification [31], which was grouped into cluster 5 and was expressed specifically in aorta-derived ECs but not in vein-derived ECs (HUVECs and HGSVECs). HRAECs showed uniquely upregulated genes, including cadherin 4 (CDH4); the protein product of this gene mediates cell–cell adhesion, and mutation of this gene is significantly associated with chronic kidney disease in the Japanese population [32]. Interestingly, at TSSs of the CDH4 and HEY2 loci, H3K4me3 was also enriched in some EC types in which the genes were not expressed, whereas the H3K27ac enrichment pattern at TSSs was correlated with the expression level of these genes (Fig. 4c). This variation in H3K4me3 with/without H3K27ac enrichment may reflect the competence of expression, which cannot be fully captured by gene expression analysis.
DEGs also contained several notable gene families. One example is the claudin family, a group of transmembrane proteins involved in barrier and pore formation [33]. Whereas CLDN5 has been reported as a major constituent of the brain EC tight junctions that make up the blood–brain barrier [34], we found that seven other genes belonging to the claudin family (CLDN1, 7, 10, 11, 12, 14 and 15) were expressed in ECs, and their expression pattern varied across EC types (Additional file 3: Figure S11). For example, in HUVECs, CLDN11 was highly expressed but CLDN14 was not, although the two claudins share a similar function for cation permeability [35]. These observations suggest that distinct usages of specific claudin proteins may result in different phenotypes with respect to vascular barrier function. Consequently, these DEGs are thus usable as a reference marker set for each EC type.
Homeobox genes are highly differentially expressed across EC types
We also found that DEGs identified in our analysis contained genes that were not previously acknowledged as relevant to the different EC types. Most strikingly, quite a few homeobox (HOX) genes were differentially expressed (cluster 1 in Figs. 4b and 5a). The human genome has four HOX clusters (HOXA, B, C and D), each of which contains 9–11 genes essential for determining the body axes during embryonic development, as well as regulating cell proliferation and migration in diverse organisms [36]. These genes are transcribed sequentially over both time and space, according to their positions within each cluster [37]. Figure 5a shows that genes in HOX clusters A, B and D were highly expressed in all EC types, except HENDCs, possibly because HENDCs are derived from cardiac neural crest, whereas the other EC types are derived from mesoderm [38]. HOXC genes were moderately expressed in HRAECs, HUVECs and HUAECs, but not in the upper-body ECs. More interestingly, perhaps, HOXD genes were not expressed in HPAECs, despite their similar expression pattern relative to other EC types around the heart (Fig. 4b). This result implies the distinct use of HOX paralogs, especially HOXD genes, in ECs. We also found that the 5′ HOX genes (blue bars in Fig. 5a) tended to be selectively expressed in EC types derived from the lower body (HGSVECs, HRAECs, HUAECs and HUVECs). Considering the collinearity of their activation during axial morphogenesis, it is conceivable that the type-specific expression of HOX clusters, especially in 5′ HOX genes, reflects the developmental origin of EC types and that distinct activation of HOX genes collectively maintains the diversity of the circulatory system.
It has been suggested that the more 3′ HOX genes tend to promote the angiogenic phenotype in ECs, whereas the more 5′ HOX genes tend to be inhibitory with respect to that phenotype [36]. For example, HOXD3 may promote wound healing and invasive or migratory behavior during angiogenesis in ECs [39]. In contrast, HOXD10 may function to inhibit EC migration by muting the downstream effects of other pro-angiogenic HOX genes (e.g., HOX3 paralogs), and thus human ECs that overexpress HOXD10 fail to form new blood vessels [40]. Figure 5a shows that HOXD10 was highly expressed in HGSVECs, which leads to the inhibition of the angiogenic phenotype as regulated by HOXD10 in this cell type.
In addition to HOX genes, multiple non-HOX homeobox genes were also differentially regulated across ECs. For example, cluster 3 in the DEGs contained NK2 homeobox 5 (Nkx-2.5), which is essential for maintenance of ventricular identity [41]; paired like homeodomain 2 (PITX2) and paired related homeobox 1 (PRRX1), which are both associated with the atrial fibrillation and cardioembolic ischemic stroke variants loci [42,43,44]; and Meis homeobox 1 (MEIS1), which is required for heart development in mice [45]. These were all associated with the GO term “blood vessel morphogenesis (GO:0048514)”. Interestingly, PITX3 was mainly expressed in HENDCs (cluster 6), unlike PITX2. Another example is Mesenchyme Homeobox 2 (MEOX2, also known as Gax; cluster 4), which regulates senescence and proliferation in HUVECs [46] and was also expressed in HUAECs and HGSVECs but not in other cell types. Taken together with the finding that some binding motifs of homeobox genes including PITX were identified among the EC-specific enhancer sites (Fig. 3), these data suggest that distinct combinations of proteins coded by HOX and non-HOX homeobox genes play a key role in mature human ECs for angiogenesis, vasculogenesis and wound healing, in addition to their function during the development and proliferation of ECs.
Enhancers in the telomeric domain were upregulated within the HOXD cluster
Lastly, we investigated the epigenomic landscape around the HOXD cluster (Fig. 5b). It has been reported that the mammalian HOXD cluster is located between two enhancer-rich topologically associating domains (TADs), the centromeric domain (C-DOM) and the telomeric domain (T-DOM), which are activated during limb and digit development, respectively [47]. By using public Hi-C (genome-wide chromosome conformation capture) data for HUVECs [48] to detect the T-DOM and C-DOM (bottom black bars), we observed the presence of EC enhancers in the T-DOM (Fig. 5b), as in early stages of limb development [47]. Of note, two long non-coding RNAs, LINC01117 (Hotdog) and LINC01116 (Twin of Hotdog), which physically contact the expressed HOXD genes and are activated during cecum budding [49], had ChIA-PET loops and showed similar expression patterns with HOXD genes in ECs (Fig. 5a, b). In the T-DOM, some enhancers are likely to be activated in most EC types (Fig. 5b, black arrow), whereas others are active only in ECs from the lower body (blue arrow), suggesting a physical interaction between these enhancers and each HOXD gene in a constitutive and a cell type-specific manner, respectively. A detailed view of the genomic region from HOXD1 to HOXD11 (Fig. 5c) shows that H3K4me3 and H3K27ac are specifically enriched within the HOXD10 locus in HGSVECs, which is consistent with their gene expression pattern. Because the HOXD10 locus did not have ChIA-PET loops in HUVECs, there might be HGSVEC-specific chromatin loops.