Comprehensive epigenome characterization reveals diverse transcriptional regulation across human vascular endothelial cells

Background Endothelial cells (ECs) make up the innermost layer throughout the entire vasculature. Their phenotypes and physiological functions are initially regulated by developmental signals and extracellular stimuli. The underlying molecular mechanisms responsible for the diverse phenotypes of ECs from different organs are not well understood. Results To characterize the transcriptomic and epigenomic landscape in the vascular system, we cataloged gene expression and active histone marks in nine types of human ECs (generating 148 genome-wide datasets) and carried out a comprehensive analysis with chromatin interaction data. We developed a robust procedure for comparative epigenome analysis that circumvents variations at the level of the individual and technical noise derived from sample preparation under various conditions. Through this approach, we identified 3765 EC-specific enhancers, some of which were associated with disease-associated genetic variations. We also identified various candidate marker genes for each EC type. We found that the nine EC types can be divided into two subgroups, corresponding to those with upper-body origins and lower-body origins, based on their epigenomic landscape. Epigenomic variations were highly correlated with gene expression patterns, but also provided unique information. Most of the deferentially expressed genes and enhancers were cooperatively enriched in more than one EC type, suggesting that the distinct combinations of multiple genes play key roles in the diverse phenotypes across EC types. Notably, many homeobox genes were differentially expressed across EC types, and their expression was correlated with the relative position of each organ in the body. This reflects the developmental origins of ECs and their roles in angiogenesis, vasculogenesis and wound healing. Conclusions This comprehensive analysis of epigenome characterization of EC types reveals diverse transcriptional regulation across human vascular systems. These datasets provide a valuable resource for understanding the vascular system and associated diseases.

Background The vasculature pervades almost all body tissues. It consists of arteries, veins and interconnecting capillaries and has numerous essential roles in physiology and disease [1]. Endothelial cells (ECs), which make up the innermost blood vessel lining of the body, express diverse phenotypes that affect their morphology, physiological function and gene expression patterns in response to the extracellular environment, including the oxygen concentration, blood pressure and physiological stress. In the kidney, for example, the vascular bed plays a role in the filtration of blood; in the brain, however, the vascular architecture protects the central nervous system from toxins and other components of the blood [2]. Endothelial heterogeneity is dependent both on the function of each organ and on the developmental lineage of different EC populations, which result in adaptation to the vascular microenvironment.
It is widely recognized that certain specific vessels are susceptible to pathological changes, which include those related to atherosclerosis and inflammation [3]. Atherosclerosis, which occurs in the muscular and elastic arteries, is a progressive disease characterized by the accumulation of macrophages, and this process is initiated by the expression of cell adhesion molecules, such as P-selectin. In mice, the expression level of P-selectin is higher in the lung and mesentery vesicles compared with the heart, brain, stomach and muscle [4]. In clinical practice, the thoracic, radial and gastroepiploic arteries are used for coronary bypass grafts because these arteries have no tendency toward atherosclerosis and hence are therapeutically advantageous in patients with coronary artery plaques [5]. In addition, the long-lasting results from coronary bypass graft surgery indicate that vessels transplanted to a new environment differ in their outcome based on their origin as an artery or vein [6,7]. Based on these observations, the elucidation of the molecular mechanisms underlying EC-type variability is critically important for understanding the development of vascular and circulation systems.
Several animal models have been developed to reveal the enhancer elements that function during the differentiation of various tissues, including ECs [8,9]. With respect to human ECs, analyses relying on cells in shortterm culture may represent good models [10,11]. Organspecific phenotypes at the microvascular level are most likely due to the intimate contact between ECs and the parenchymal cells of a particular organ-i.e., there is a substantial environmental effect [2,6]. In contrast, for macrovascular ECs, a large subset of their characteristics is most likely derived from their epigenetic status, which would have been set during development and can be retained even when the cells are isolated in culture. Increasing evidence supports the idea that certain sitespecific characteristics are epigenetically regulated [12,13]. For example, a previous study of four types of mouse ECs in culture demonstrated that site-specific epigenetic modifications play an important role in differential gene expression [14]. Moreover, our recent reports elucidated that there are different histone modifications present in the same genomic loci, such as GATA6, in human umbilical vein endothelial cells (HUVECs) and human dermal microvascular endothelial cells (HMVECs) [10,11]. Despite the discovery of these important insights, we still lack a systematic understanding of how the epigenomic landscape contributes to EC phenotype and heterogeneity. Therefore, there is a great demand for a comprehensive epigenomic catalog of the various EC types.
As a part of the International Human Epigenome Consortium (IHEC) project [15], we collected chromatin immunoprecipitation followed by sequencing (ChIPseq) data for the active histone modifications trimethylated H3 at Lys4 (H3K4me3) and acetylated H3 at Lys27 (H3K27ac) in EC DNA from nine different vascular cell types, eight of which were derived from macrovascular ones (both arteries and veins), from multiple donors. We implemented large-scale comparative ChIP-seq analysis of these datasets and collected gene expression data to understand how the diverse phenotypes of ECs are regulated by key genes. All datasets used in this study are publicly available and are summarized on our website (https ://rnaka to.githu b.io/Human Endot helia lEpig enome /). a physiological concentration of vascular endothelial growth factor (VEGF) and a minimal number of passages (fewer than six). We generated genome-wide normalized coverage tracks and peaks for ChIP-seq data and estimated normalized gene expression values for RNA-seq data.
In this study, we selected a subset of 33 EC samples (131 datasets) as a representative set comprising nine types of vessels from the human body (Fig. 1a): Details about the 33 EC samples are presented in Additional file 1: Table S1.
Among the structures lined by these nine EC types, a group of two aortic, six common carotid and three coronary arteries is known as the "systemic arteries" and harbors arterial blood with 100 mmHg of oxygen tension and blood pressure in a range from 140 to 60 mmHg. Data sets for each cell type comprise samples from multiple donors, all of which achieved high-quality values as evaluated below. Here, we focused on two histone modifications, H3K4me3 and H3K27ac (Fig. 1b), which are the key markers of active promoters and enhancers [16]. Because both H3K4me3 and H3K27ac exhibit strong, sharp peaks with ChIP-seq analysis, they are more suitable for identifying shared and/or unique features across EC cell types as compared with other histone modifications that show broad peaks, such as H3K9me3.

Quality validation
To evaluate the quality of obtained ChIP-seq data, we computed a variety of quality control measures (Additional file 2: Table S2), including the number of uniquely Fig. 1 Summary of the cell types and histone modifications analyzed in this project. a Schematic illustration of the cardiovascular system, nine EC types and 33 individual samples (indicated by the prefix "EC") used in this paper. The yellow and green boxes indicate EC types from the upper body and lower body, respectively. b Workflow to identify the reference sites for ECs. The active promoter and enhancer sites of each sample were identified. For each cell type, the shared sites across all samples were extracted as the reference sites. These were integrated into a single set of reference sites for ECs, which was used for the downstream analyses. ChIA-PET data were utilized to identify the corresponding gene for the reference enhancer sites. c Correlation between observed and expected (from ChIP-seq analysis using linear regression model) gene expression data. Left: example scatterplot of observed and expected gene expression level for genes (data from EC13). Right: Pearson correlation heatmap for representative samples of nine cell types and IMR90 cells (as a negative control) mapped reads, library complexity (the fraction of nonredundant reads), GC content of mapped reads, genome coverage (the fraction of overlapped genomic areas with at least one mapped read), the number of peaks, signalto-noise ratio (S/N) based on the normalized strand coefficient [17], read-distribution bias measured by background uniformity [17], inter-sample correlation for each EC type and genome-wide correlation of read density across all-by-all pairs (Additional file 3: Figure S1). In addition, the peak distribution around several known positive/negative marker genes was visually inspected. Low-quality datasets were not used for further analyses.
To further validate the reliability of our data, we evaluated the consistency between the obtained peaks from ChIP-seq and the gene expression values from corresponding RNA-seq data. We applied a bivariate regression model [18] to estimate the expression level of all genes based on H3K4me3 and H3K27ac peaks and then calculated the Pearson correlation between the estimated and the observed expression levels from ChIP-seq and RNA-seq, respectively. We used data derived from IMR90 fibroblasts analyzed with the same antibodies as a negative control, and we confirmed that peak distribution of the ChIP-seq data was highly correlated with corresponding RNA-seq data for ECs, but not with IMR90 data (Fig. 1c, Additional file 3: Figure S2 for the full matrix). Therefore, our ChIP-seq data are likely to represent the histone modification states of ECs for annotation.

Identification of active promoter and enhancer sites
We used H3K4me3 and H3K27ac ChIP-seq peaks to define "active promoter (H3K4me3 and H3K27ac)" and "enhancer (H3K27ac only)" sites for each sample (Fig. 1b, left). Then we assembled them and defined the common sites among all samples of a given EC type as the reference sites, to avoid differences specific to individuals. Finally, the reference sites of all nine EC types were merged into a single reference set for ECs (Fig. 1b, right). We identified 9121 active promoter sites (peak width, 2840.8 bp on average) and 23,202 enhancer sites (peak width, 1799.4 bp on average). The averaged peak width became relatively wide due to the merging of multiple contiguous sites.
We compared the distribution of the reference sites with gene annotation information. As expected, active promoter sites were enriched in the transcription start sites (TSSs) of genes, whereas enhancer sites were more frequently dispersed in introns and intergenic regions (Additional file 3: Figure S3). Among the enhancers, 15,625 (67.3%) were distally located (more than 10 kbp away from the nearest TSSs). The number of enhancer sites was more varied among the nine EC types, whereas the number of active promoter sites was comparable across the EC types (Fig. 2a, upper panel). The large number of HUAEC enhancer sites is possibly due to the small number of samples (two) and the relatively small difference between the individuals (both samples were from newborns). We also evaluated the shared ratio of promoter and enhancer sites across all EC types (Fig. 2a, lower panel). We found that nearly 80% of the active promoter sites were shared among multiple EC types. In contrast, 57.7% of the enhancers were specific to up to two EC types, suggesting that their more diverse distribution across EC types relative to active promoter sites contributes to the EC type-specific regulatory activity. These observations are consistent with previous studies for other cell lines [16,19].

Evaluation of enhancer sites by PCA
To investigate the diverse distribution of our reference enhancer sites, we used the principal component analysis (PCA) based on the H3K27ac read densities in the integrated EC enhancer sites with the 117 cell lines from the Roadmap Epigenomics Project [19]. We found that ECs were well clustered and separated from other cell lines (Fig. 2b). Remarkably, HUVECs represented in the Roadmap Epigenomics Project dataset, termed E122, were properly included in the EC cluster (red circle). In contrast, IMR90 cells from our study were included in the non-EC cluster (blue circle). This result supported the reliability of our EC-specific enhancer profiling. It should be noted, however, that the samples for each EC cell type (indicated by different colors) were not well clustered, possibly because the EC type-specific difference is minuscule and is overshadowed by differences at the level of the individual.

Identification of enhancer-promoter interactions by ChIA-PET
We sought to identify the corresponding gene for the reference enhancer sites and used chromatin loop data obtained from the Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) data using RNA Polymerase II (Pol II) in HUVECs. We identified 292 significant chromatin loops (false discovery rate [FDR] < 0.05), 49.3% (144 loops) of which connected promoter and enhancer sites. Even when we used all chromatin loops (at least one read pair), 27.4% (8782 of 31,997) of them linked to enhancer-promoter sites.
As the ChIA-PET data are derived from RNA Pol IIassociated loops in HUVECs, chromatin interactions in active genes could be detected.

Identification of EC-specific sites
Next, we identified EC-specific enhancer sites by excluding any sites from our reference sites that overlapped with those of our IMR90 cells and other cell types from the Roadmap Epigenomics Project, except HUVECs (E122). As a result, we obtained 3765 EC-specific enhancer sites (Additional file 4: Table S3), some of which were located around known marker genes of ECs with chromatin loops. One example is kinase insert domain receptor (KDR; Fig. 2c, left), which functions as the VEGF receptor, causing endothelial proliferation, survival, migration, tubular morphogenesis and sprouting [21]. The TSS of KDR was marked as an active promoter (enriched for both H3K4me3 and H3K27ac) and physically interacted with the EC-specific enhancer sites indicated by H3K27ac, ~ 50 kbp upstream and downstream of the TSS. Another example is intercellular adhesion molecule 2 (ICAM2; Fig. 2c, right), which is an endothelial marker and is involved in the binding to white blood cells that occurs during the antigen-specific immune response [22]. This gene has two known TSSs, both of which were annotated as active promoters in ECs and one of which that was EC specific (black arrow). This EC-specific TSS did not have a ChIA-PET interaction, and, likewise, the enhancer sites within the entire gene body did not directly interact with the adjacent promoter sites, implying the distinctive regulation of the two ICAM2 promoters.

Genome-wide association study (GWAS) enrichment analysis
To explore the correlation of EC-specific reference enhancer sites with sequence variants associated with disease phenotypes, we obtained reference GWAS single-nucleotide polymorphisms (SNPs) from the GWAS catalog [23] and identified significantly enriched loci by permutation analysis [24]. Notably, we identified 67 enhancer sites that markedly overlapped with GWAS SNPs associated with "heart", "coronary" and "cardiac" (Z score > 5.0, Additional file 5: Table S4). The most notable region was around CALCRL and TFPI loci (chr2:188146468-188248446, Fig. 2d). The EC-specific enhancer region in these loci contained four GWAS risk variants (Fig. 2d, red triangles), three of which are associated with coronary artery/heart disease [25,26]. Another example is the RSPO3 locus (Additional file 3: Figure S4). The upstream distal enhancer regions of that gene contained four GWAS SNPs that are associated with cardiovascular disease and blood pressure [27,28].

Functional analysis of the reference sites
We next investigated whether any characteristic sequence feature is observed in the EC-specific enhancer sites, as well as in the active promoter and distal enhancer sites; the subgroup "all enhancers" was omitted because of its close similarity with the "distal enhancer" subgroup. We found many putative motifs in EC-specific enhancers and fewer motifs in the active promoter and distal enhancer subgroups. Some of these motifs had candidate transcription factors (TFs) assigned in the JASPAR database (Additional file 3: Figure S5). Of note, EC-specific enhancer sites had motifs similar to those in the homeobox genes bcd, oc, Gsc and PITX1-3 ( Fig. 3), suggesting their involvement in orchestrating EC-specific gene expression. In fact, because most of the EC-specific enhancer sites consisted of enhancers in HGSVECs (47.0%), HRAECs (37.4%) and HUAECs (68.3%) (Additional file 3: Figure S6), the identified motifs might be The identified de novo motif from EC-specific enhancer sites. The two related canonical motifs derived from the JASPAR database are also shown involved mainly in the function of these EC types. Active promoter and distal enhancer sites had fewer candidate TFs as compared with EC-specific enhancer sites, possibly because they contain sites corresponding to more common genes (e.g., housekeeping genes). We found that active promoters (and EC-specific enhancers) have a motif similar to the canonical motif of the ETS2 repressor factor ERF (Additional file 3: Figure S5), which is consistent with previous studies [8,13,14] that reported that the ETS family motif is enriched in enhancers of several EC types. We also looked into the Gene Ontology (GO) classifications under Biological Process for the enhancer sites using GREAT [29] and found that the enhancer sites (both all sites and EC-specific sites) have GO terms that are more specific to the vascular system (e.g., platelet activation, myeloid leukocyte activation and vasculogenesis), as compared with active promoter sites (e.g., mRNA catabolic process, Additional file 3: Figure S7). This also suggests that the enhancer sites are more likely to be associated with EC-specific functions, whereas promoter sites are also correlated with the more common biological functions.

Differential analysis and clustering across EC types
One important issue of this study is to clarify the epigenomic/transcriptomic diversity across EC cell types. To circumvent variances at the level of the individual in each cell type observed (Fig. 2c) and different S/N ratios, we fitted the value of peak intensity on the reference enhancer sites among samples using generalized linear models with the quantile normalization. By implementing a PCA, we confirmed that different cell samples in the same EC type were properly clustered (Fig. 4a). The PCA also showed that different EC types can be divided into two subgroups based on the epigenomic landscape, corresponding to upper-body (HAoEC, HCoAEC, HPAEC, HCCaEC and HENDC, purple circle) and lower-body (HUVEC, HUAEC, HGSVEC and HRAEC) origins. A PCA based on gene expression data showed similar results to that based on the H3K27ac profile, although in the gene expression analysis HUAECs were more similar to heart ECs (Additional file 3: Figure S8).
To further investigate this tendency, we implemented a multiple-group differential analysis with respect to H3K4me3, H3K27ac and gene expression data to obtain sites and genes whose values varied significantly between any of the nine cell types. With the threshold FDR < 1e−5, we identified 753 differential H3K4me3 sites (differential promoters, DPs; 8.3% from 9121 active promoter sites), 2979 differential H3K27ac sites (differential enhancers, DEs; 9.2% from 32,323 active promoter and enhancer sites) and 879 differentially expressed genes (DEGs; 2.1% from 41,880 genes). As expected, DPs and DEs were more enriched around DEGs, as compared with all genes. DPs were enriched within ~ 10 kbp from TSSs, whereas DEs were more broadly distributed (~ 100 kbp) (Additional file 3: Figure S9), indicative of the longer-range interactions between enhancers and their corresponding genes.
We then implemented k-means clustering (k = 6) to characterize the overall variability of DEGs, DEs (Fig. 4b) and DPs (Additional file 3: Figure S10). The clustering results are also summarized in Additional file 6: Table S5, Additional file 7: Table S6 and Additional file 8: Table S7. Although k = 6 was empirically defined and might not be biologically optimal to classify the nine EC types, the results did capture differential patterns among them. The upregulated genes were roughly categorized into upperand lower-body-specific EC types (Fig. 4b), even though diverse expression patterns were observed overall. In particular, the expression patterns of the EC types around the heart (HCoAEC, HAoEC and HPAEC) were similar (cluster 3 of DP and DEG), consistent with the anatomical proximity of these ECs. HENDCs had uniquely expressed genes (cluster 6 of DEGs). Considering that most of the DEGs and DEs are cooperatively enriched in more than one EC type, these nine EC types may use distinct combinations of multiple genes, rather than exclusively expressed individual genes, for their specific phenotype.

DEGs that contribute to EC functions
Our clustering analysis identified important genes for EC functions as DEGs (Fig. 4b). For example, heart and neural crest derivatives expressed 2 (HAND2) and GATAbinding protein 4 (GATA4) were expressed in HAoECs, HENDCs and HPAECs (cluster 3). HAND2 physically interacts with GATA4 and the histone acetyltransferase p300 to form the enhanceosome complex, which regulates tissue-specific gene expression in the heart [30]. Another example is hes related family bHLH transcription factor with YRPW motif 2 (HEY2, also called Hrt2), a positive marker for arterial EC specification [31], which was grouped into cluster 5 and was expressed specifically in aorta-derived ECs but not in vein-derived ECs (HUVECs and HGSVECs). HRAECs showed uniquely upregulated genes, including cadherin 4 (CDH4); the protein product of this gene mediates cell-cell adhesion, and mutation of this gene is significantly associated with chronic kidney disease in the Japanese population [32]. Interestingly, at TSSs of the CDH4 and HEY2 loci, H3K4me3 was also enriched in some EC types in which the genes were not expressed, whereas the H3K27ac enrichment pattern at TSSs was correlated with the expression level of these genes (Fig. 4c). This variation in H3K4me3 with/without H3K27ac enrichment may reflect the competence of expression, which cannot be fully captured by gene expression analysis.
DEGs also contained several notable gene families. One example is the claudin family, a group of transmembrane proteins involved in barrier and pore formation [33]. Whereas CLDN5 has been reported as a major constituent of the brain EC tight junctions that make up the blood-brain barrier [34], we found that seven other genes belonging to the claudin family (CLDN1, 7, 10, 11, 12, 14 and 15) were expressed in ECs, and their expression pattern varied across EC types (Additional file 3: Figure S11). For example, in HUVECs, CLDN11 was highly expressed but CLDN14 was not, although the two claudins share a similar function for cation permeability [35]. These observations suggest that distinct usages of specific claudin proteins may result in different phenotypes with respect to vascular barrier function. Consequently, these DEGs are thus usable as a reference marker set for each EC type.

Homeobox genes are highly differentially expressed across EC types
We also found that DEGs identified in our analysis contained genes that were not previously acknowledged as relevant to the different EC types. Most strikingly, quite a few homeobox (HOX) genes were differentially expressed (cluster 1 in Figs. 4b and 5a). The human genome has four HOX clusters (HOXA, B, C and D), each of which contains 9-11 genes essential for determining the body axes during embryonic development, as well as regulating cell  proliferation and migration in diverse organisms [36]. These genes are transcribed sequentially over both time and space, according to their positions within each cluster [37]. Figure 5a shows that genes in HOX clusters A, B and D were highly expressed in all EC types, except HEN-DCs, possibly because HENDCs are derived from cardiac neural crest, whereas the other EC types are derived from mesoderm [38]. HOXC genes were moderately expressed  in HRAECs, HUVECs and HUAECs, but not in the upper-body ECs. More interestingly, perhaps, HOXD genes were not expressed in HPAECs, despite their similar expression pattern relative to other EC types around the heart (Fig. 4b). This result implies the distinct use of HOX paralogs, especially HOXD genes, in ECs. We also found that the 5′ HOX genes (blue bars in Fig. 5a) tended to be selectively expressed in EC types derived from the lower body (HGSVECs, HRAECs, HUAECs and HUVECs). Considering the collinearity of their activation during axial morphogenesis, it is conceivable that the type-specific expression of HOX clusters, especially in 5′ HOX genes, reflects the developmental origin of EC types and that distinct activation of HOX genes collectively maintains the diversity of the circulatory system. It has been suggested that the more 3′ HOX genes tend to promote the angiogenic phenotype in ECs, whereas the more 5′ HOX genes tend to be inhibitory with respect to that phenotype [36]. For example, HOXD3 may promote wound healing and invasive or migratory behavior during angiogenesis in ECs [39]. In contrast, HOXD10 may function to inhibit EC migration by muting the downstream effects of other pro-angiogenic HOX genes (e.g., HOX3 paralogs), and thus human ECs that overexpress HOXD10 fail to form new blood vessels [40]. Figure 5a shows that HOXD10 was highly expressed in HGSVECs, which leads to the inhibition of the angiogenic phenotype as regulated by HOXD10 in this cell type.
In addition to HOX genes, multiple non-HOX homeobox genes were also differentially regulated across ECs. For example, cluster 3 in the DEGs contained NK2 homeobox 5 (Nkx-2.5), which is essential for maintenance of ventricular identity [41]; paired like homeodomain 2 (PITX2) and paired related homeobox 1 (PRRX1), which are both associated with the atrial fibrillation and cardioembolic ischemic stroke variants loci [42][43][44]; and Meis homeobox 1 (MEIS1), which is required for heart development in mice [45]. These were all associated with the GO term "blood vessel morphogenesis (GO:0048514)". Interestingly, PITX3 was mainly expressed in HENDCs (cluster 6), unlike PITX2. Another example is Mesenchyme Homeobox 2 (MEOX2, also known as Gax; cluster 4), which regulates senescence and proliferation in HUVECs [46] and was also expressed in HUAECs and HGSVECs but not in other cell types. Taken together with the finding that some binding motifs of homeobox genes including PITX were identified among the ECspecific enhancer sites (Fig. 3), these data suggest that distinct combinations of proteins coded by HOX and non-HOX homeobox genes play a key role in mature human ECs for angiogenesis, vasculogenesis and wound healing, in addition to their function during the development and proliferation of ECs.

Enhancers in the telomeric domain were upregulated within the HOXD cluster
Lastly, we investigated the epigenomic landscape around the HOXD cluster (Fig. 5b). It has been reported that the mammalian HOXD cluster is located between two enhancer-rich topologically associating domains (TADs), the centromeric domain (C-DOM) and the telomeric domain (T-DOM), which are activated during limb and digit development, respectively [47]. By using public Hi-C (genome-wide chromosome conformation capture) data for HUVECs [48] to detect the T-DOM and C-DOM (bottom black bars), we observed the presence of EC enhancers in the T-DOM (Fig. 5b), as in early stages of limb development [47]. Of note, two long non-coding RNAs, LINC01117 (Hotdog) and LINC01116 (Twin of Hotdog), which physically contact the expressed HOXD genes and are activated during cecum budding [49], had ChIA-PET loops and showed similar expression patterns with HOXD genes in ECs (Fig. 5a, b). In the T-DOM, some enhancers are likely to be activated in most EC types (Fig. 5b, black arrow), whereas others are active only in ECs from the lower body (blue arrow), suggesting a physical interaction between these enhancers and each HOXD gene in a constitutive and a cell type-specific manner, respectively. A detailed view of the genomic region from HOXD1 to HOXD11 (Fig. 5c) shows that H3K4me3 and H3K27ac are specifically enriched within the HOXD10 locus in HGSVECs, which is consistent with their gene expression pattern. Because the HOXD10 locus did not have ChIA-PET loops in HUVECs, there might be HGSVEC-specific chromatin loops.

Discussion
In this study, we analyzed the epigenomic status of the active histone modifications H3K4me3 and H3K27ac in 33 samples from nine different EC types by ChIPseq, RNA-seq, ChIA-PET and Hi-C analyses. The EC donors in this study varied in age (newborn to 78 years old), sex, race (Afro-American, Arabic, Asian, Caucasian, Native American) and medical history, all features that may influence the epigenomic profiles of these isolated cells. Although it is difficult to completely avoid effects from these factors in a human study because of the limited number of donors [50], we did carry out multiple approaches to mitigate potential biases as much as possible. For instance, instead of carrying out all-by-all pairwise comparisons, we focused on a multiple-group comparison for the nine EC types, in which each group can 'borrow' information from the other groups and minimize the impact of inherent noise in each group.
The integrative ChIP-seq analysis described here, which was based on samples from the human tissues of multiple donors, is potentially hampered by both individual variation and technical noise derived from sample preparation under various conditions, relative to the smaller differences among EC types. To overcome this issue, at least in part, we developed a robust procedure for comparative epigenome analysis. We focused on two histone modifications, H3K4me3 and H3K27ac, which exhibit robust, sharp peaks and therefore are suitable for identifying shared and/or unique features across EC cell types, in contrast to more broad histone marks (e.g., H3K9me3). To ensure the robustness of this study, we used a rather conservative procedure for detecting EC-specific sites, which considered only the common sites among all samples that did not overlap with peaks in all 116 cell lines from the Roadmap Epigenomics Project. Therefore, there may be "true" enhancer sites among the filtered ones. In the small comparative analysis using a subset of our data (e.g., a pairwise comparison between an artery EC and a vein EC), more enhancer sites may be included for the downstream analysis. In addition, for motif analysis, the information concerning the precise position of TF binding might have been lost because multiple nearby sites were merged into a single large reference site. A more specialized scheme for each EC type may increase the power to identify additional candidates of binding TFs.
In the future, we aim to expand this analysis to other core histone marks including suppressive markers (e.g., H3K27me3 and H3K9me3) and apply semi-automated genome annotation methods [51]. Because this type of genome annotation strategy with its associated assembling of broad marks is more sensitive to noise, more stringent quality control of tissue data will be required.
We successfully identified 3765 EC-specific enhancer sites, 67 of which were highly significantly overlapping with GWAS SNPs. We also found variation in H3K4me3 enrichment at TSSs that may reflect the competence of gene expression, which cannot be fully captured by gene expression analysis (Fig. 4c). The PCA showed that EC types can be divided into those from the upper and from the lower body (Fig. 4a). Most of the DEGs and DEs are cooperatively enriched in more than one EC type. Furthermore, our analysis suggests the importance of the differential usage of genes in gene families for the diversity of the circulation system. For instance, we identified a regulatory motif enriched in EC-specific enhancers that is very similar to that of homeobox protein PITX (Fig. 3), whereas PITX2 and 3 are included in different DEG clusters (Fig. 4b). We also found the distinctive expression pattern of genes belonging to HOX clusters (Fig. 5a) and the claudin family across ECs (Additional file 3: Figure S11), even when they share a similar function. Consequently, the nine EC types tend to use distinct combinations of multiple genes, rather than exclusively expressed genes, for their specific phenotype.
Our results identified key marker genes that were differentially expressed across EC types, such as homeobox genes. The importance of several HOX genes for early vascular development and adult angiogenesis in pathological conditions has been reported [52]. However, a systematic analysis of the regulation and roles of homeobox genes in mature tissue cells has been lacking. A recent study indicates that the patterns of gene expression in HOX clusters in four different organs are consistent with their anterior-posterior positions within the mouse body [14]. Consistent with this, we found that the 5′ HOX genes tended to be selectively expressed in EC types derived from the lower body, which might reflect the developmental origin of EC types (Fig. 5a). We also found that the multiple non-HOX homeobox genes were also differentially regulated across ECs, indicative of the importance of differential usage of homeobox proteins beyond the four HOX cluster regions. Moreover, we identified distinct epigenome states and chromatin conformations of HOX gene clusters and flanking regions in different EC types. Although a low correlation between gene expression level and DNA methylation was reported in mouse ECs [14], the levels of active histone marks H3K4me3 and H3K27ac were correlated with the gene expression pattern in human ECs. Thus, in ECs HOX gene expression is likely to be regulated by histone modifications rather than DNA methylation. Taken together, our data suggest the distinct roles and combinatorial usage of proteins coded by HOX and non-HOX homeobox genes during development and in regulating EC phenotypes throughout the body.

Conclusions
The primary goal of the IHEC project is to generate highquality reference epigenomes and make them available to the scientific community [15]. To this end, we established an epigenetic catalog of various human ECs and implemented comprehensive analysis to elucidate the diversity of the epigenomic and transcriptomic landscape across EC types. The dataset presented in this study will be an important resource for future work on understanding the human cardiovascular system and its associated diseases.

Tissue preparation
ECs were isolated from the vasculature and maintained as primary cultures, as reported [53,54]. Briefly, HAoECs, HCoAECs, HENDCs, HPAECs and HUVECs were isolated from the various vessels by incubating the vessels with collagenase at 37 °C for 30 min. The aortic root was used for HAoEC isolation. Cells were plated in tissue culture flasks (Iwaki Glass Co. Ltd., cat. No. 3110-075-MYP) and cultured for one or more passages in modified VascuLife VEGF Endothelial Medium (Lifeline Cell Technology). A reduced concentration of VEGF that was lower than 5 ng/mL was tested in preliminary cell culture experiments and then optimized to be as low as possible considering cell growth and viability (data not shown). The VEGF concentration was lowered to 250 pg/ mL, which is lower than the standard culture conditions, to more closely replicate in vivo concentrations [55]. ECs were separated from non-ECs using immunomagnetic beads. Fibroblasts were first removed using antifibroblast beads and the appropriate magnetic column (Miltenyi Biotec). The remaining cells were then purified using Dynabeads and anti-CD31 (BAM3567, R&D Systems). When positive selection was used, the bead-bound cells were removed from the cell suspension prior to cryopreservation.
HCCaECs and HRECs were prepared by an explant culture method [54]. HGSVECs were isolated from discarded veins taken from patients at Saitama Medical University International Medical Center.
Quality control was performed using a sterility test (for bacteria, yeast and fungi), a PCR-based sterility test (for hepatitis B and C, HIV-I and -II and mycoplasma) and immunostaining-based characterization for von Willebrand factor (vWF) (> 95% cells are positively stained [56]) and alpha-actin, and viability was determined by both counting and trypan blue staining.

Cell culture
Purified ECs were cultured in VascuLife VEGF Endothelial Medium (Lifeline Cell Technology) with 250 pg/mL VEGF. Cells were maintained at 37 °C in a humidified 5% CO 2 incubator, and the medium was changed every 3 days. The cells used in the experiments were from passage 6 or less. The cryopreservation solution used consisted of VascuLife VEGF Endothelial medium, containing 250 pg/mL VEGF, 12% fetal bovine serum and 10% dimethylsulfoxide.

RNA-seq analysis
Poly(A)-containing mRNA molecules were isolated from total RNA and then converted to cDNA with oligo(dT) primers using a TruSeq RNA Sample Preparation kit v2 (Illumina) and were sequenced with a HiSeq 2500 system (Illumina). We applied sequenced paired-end reads to kallisto version 0.43.1 [57] with the "-rf-stranded -b 100" option, which estimates the transcript-level expression values as Transcripts Per Kilobase Million (TPM, Ensembl gene annotation GRCh37). These transcriptlevel expression values were then assembled to the gene-level by tximport [58]. We also obtained RNA-seq data from IMR90 cells from the Sequence Read Archive (SRA) (www.ncbi.nlm.nih.gov/sra) under accession number SRR2952390. The full list of gene expression data is available at the NCBI Gene Expression Omnibus (GEO) under the accession number GSE131953.

ChIP
For each EC sample, two million ECs were plated on a 15-cm culture plate and cultured until confluency. The cells were crosslinked for 10 min using 1% paraformaldehyde. After quenching using 0.2 M glycine, cells were collected using a scraper, resuspended in SDS lysis buffer (10 mM Tris-HCl, 150 mM NaCl, 1% SDS, 1 mM EDTA; pH 8.0) and fragmented by sonication (Branson; 10 min). Samples were stored at − 80 °C before use. To perform ChIP, antibodies against histone modifications (CMA304 and CMA309 for H3K4me3 and H3K27ac, respectively) [59] were used in combination with protein G Sepharose beads (GE Healthcare Bio-Sciences AB, Sweden). The prepared DNA was quantified using Qubit (Life Technologies/Thermo Fisher Scientific), and > 10 ng of DNA was processed, as described below. The primer sequences for ChIP-qPCR were as follows: for H3K4me3, KDR

ChIP-seq analysis
Sequencing libraries were made using the NEBNext ChIP-Seq Library Prep Master Mix Set of Illumina (New England Biolabs). Sequenced reads were mapped to the human genome using Bowtie version 1.2 [60] allowing two mismatches in the first 28 bases per read and outputting only uniquely mapped reads (-n2 -m1 option). Peaks were called by DROMPA version 3.5.1 [61] using the stringent parameter set (-sm 200 -pthre_internal 0.00001 -pthre_enrich 0.00001) to mitigate the effect of technical noise. The mapping and peak statistics are summarized in Additional file 2: Table S2.

Quality validation of ChIP-seq samples
We checked the quality of each sample based on the peak number, library complexity and GC content bias by DROMPA; the normalized strand coefficient and background uniformity by SSP [17]; inter-sample correlation (Jaccard index of peak overlap) by bedtools (https ://githu b.com/arq5x /bedto ols2); and the pairwise correlations of read coverage by deepTools version 2.5.0 [62] (Additional file 3: Figure S1).

Regression analysis of ChIP-seq data
To estimate the expression level of a gene from the level of its histone modifications, we implemented the linear regression analysis proposed by Karlic et al. [18] with minor modifications. We built a two-variable model to predict the expression level for each mRNA as follows: where x 1 and x 2 are the log-scale base pair coverage in a region of 4-kbp surrounding the TSSs covered by obtained peaks of H3K4me3 and H3K27ac, respectively. We used the level of protein-coding mRNA in autosomes as an estimation of the level of histone modifications. We then learned the parameters a, b 1 and b 2 using all of the EC samples and the IMR90 sample to minimize the differences between observed and expected values. Using the learned parameter set, we predicted the expression value for each mRNA and calculated the Pearson correlation between observed and expected values.

Definition of reference promoter and enhancer sites
As shown in Fig. 1b, we defined active promoters and enhancers as "H3K4me3 sites overlapping with H3K27ac sites by ≥ 1 bp" and "H3K27ac sites not overlapping with H3K4me3 sites", respectively, based on the annotation of the Roadmap Epigenomics consortium [19]. Peaks from sex chromosomes were excluded to ignore sex-specific differences. To avoid the effect of individual differences, the common sites among all samples were used as the reference sites for each cell type. Then the reference sites of all cell types were merged into the reference sites for ECs. Multiple sites that were within 100 bp of each other were merged to avoid multiple counts of large individual sites. The generated reference promoter and enhancer sites are available at the GEO under the accession number GSE131953.

Identification of EC-specific sites
We called peaks for H3K4me3 and H3K27ac for all 117 cell lines in the Roadmap Epigenomics Project by DROMPA with the same parameter set. We then excluded the sites in the reference promoter and enhancer sites of ECs that overlapped the H3K4me3 peaks (promoter sites) or H3K27ac peaks (enhancer sites) of all cells except for E122 (HUVECs) from the Roadmap Epigenomics Project. Similarly, we further excluded the sites that overlapped H3K27ac peaks of IMR90 cells generated by this study, to avoid the protocol-dependent false-positive peaks. The resulting sites were used as ECspecific sites. We also defined "distal enhancer sites" as those that are > 10 kbp from the nearest TSS. These sites are summarized in Additional file 4: Table S3.

GWAS enrichment analysis
We implemented GWAS enrichment analysis using a strategy similar to that of Lake et al. [24]. We obtained reference SNPs from the GWAS Catalog [23]. We then calculated the occurrence probability of GWAS SNPs associated with the terms "heart", "coronary" and "cardiac" in 100-kb regions centered on all EC-specific enhancer sites and investigated their statistical significance by random permutations. We extended each enhancer site to a 100-kb region to consider linkage disequilibrium with GWAS SNPs. We identified the enhancer sites with a Z-score > 5.0. We shuffled the enhancer sites randomly within each chromosome, ignoring the centromeric region, using bedtools shuffle command.

Differential analysis of multiple groups for histone modification and gene expression
We applied the ANOVA-like test in edgeR [63] based on the normalized read counts of H3K4me3 ChIP-seq data in active promoters and H3K27ac ChIP-seq data in active promoters and enhancers, as well as gene expression data, while fitting the values among samples to estimate dispersion using generalized linear models. For RNA-seq data, the count data were fitted using a generalized linear model, and the Z-score was calculated based on logged values. For ChIP-seq data, we also applied the quantile normalization to peak intensity in advance of the fitting because this model does not consider the different S/N ratios among samples [50]. This normalization assumes that the S/N ratio for most of the common peaks should be the same among all samples in which the same antibody was used. Additional file 3: Figure S12 shows the distribution patterns of the H3K27ac read density normalized for quantile normalization for all ECs.

Chromatin interaction analysis
We used ChIA-PET data mediated by RNA Pol II for HUVECs [64]. We acquired fastq files from the GEO under accession number GSE41553, applied Mango [65] with default parameter settings and identified the 943 significant interactions (1886 sites, FDR < 0.05). For Hi-C analysis, we acquired.hic files for HUVECs from the GEO under accession number GSE63525 and applied Juicer [48] to obtained the TAD structure (Fig. 5b).