Chromatin Accessibility Maps Provide Evidence of Multilineage Gene Priming in Hematopoietic Stem Cells

Hematopoietic stem cells (HSCs) have the capacity to differentiate into vastly different types of mature blood cells. The epigenetic mechanisms regulating the multilineage ability, or multipotency of HSCs are not well understood. To test the hypothesis that cis regulatory elements that control fate decisions for all lineages are primed in HSCs, we used ATAC-seq to compare chromatin accessibility of HSCs with five unipotent cell types. We observed the highest similarity in accessibility profiles between Megakaryocyte Progenitors and HSCs, whereas B cells had the greatest number of regions with de novo gain in accessibility during differentiation. Despite these differences, we identified cis regulatory elements from all lineages that displayed epigenetic priming in HSCs. These findings provide new insights into the regulation of stem cell multipotency, as well as a resource to identify functional drivers of lineage fate. HIGHLIGHTS HSCs have higher global chromatin accessibility than any unilineage progeny Megakaryocyte Progenitors are the most closely related unipotent cell type to HSCs B cell commitment involves de novo chromatin accessibility Evidence of cis element priming of lineage-specific genes in HSCs


INTRODUCTION
Multipotency is a key feature of hematopoietic stem cells (HSCs) and essential for their ability to produce all types of blood and immune cells in situ and upon therapeutic stem cell transplantation.
The mechanistic basis of multipotency is unclear, but previous studies have shown that the regulation of differentiation programs is achieved, in large part, through epigenetic remodeling of cis-regulatory elements (CREs) (Forsberg et al., 2000;Shivdasani et al., 1997;Wang et al., 2015).
Thus, HSC multipotency may be enabled by accessible non-promoter CREs that keep loci competent for transcription factor binding and gene activation without active expression. Such selective "CRE priming" may underlie the developmental competence of specific cell types, which is then acted upon by inductive signals to gradually specify fate (Waddington, 1940). When all CREs that drive differentiation and lineage choice are primed in stem cells, that stem cell is in a permissive state ( Figure 1A) and is competent to initiate differentiation into all mature lineages.
We sought to test two models of HSC multipotency that are based on regulation of chromatin organization: the "permissive fate model" and a "de novo activation model" (Figure 1A).
Supporting a role for the permissive model in stem cell lineage potential are observations of bivalent histone domains that maintain key developmental genes in embryonic stem cells (ESCs) poised for activation (Bernstein et al., 2006), and an overall accessible chromatin state in both ESCs and HSCs compared to lineage-restricted progenitors and mature cells (Gaspar-Maia et al., 2009Ugarte et al., 2015). When differentiation occurs, the genes poised for differentiation into the induced lineage are activated while CREs that would drive differentiation into alternative lineages are silenced into heterochromatin. This has been observed in ESCs and during differentiation of ESCs into endoderm (Wang et al., 2015;Xu et al., 2009). Our observation of global chromatin condensation and localization of H3K9me3-marked heterochromatin towards the nuclear periphery during HSC differentiation also support the permissive model (Ugarte et al., 2015). Inversely, in the de novo activation model ( Figure 1A), CREs that drive lineage fate are inaccessible in HSCs. Differentiation and lineage choice occur by "unlocking" these CREs.
Transcriptional and functional analyses of hematopoietic stem and progenitor cells support this de novo model, where lymphoid potential is gained in progenitor cells rather than being a consequence of CRE priming in HSCs (Boyer et al., 2019;Cabezas-Wallscheid et al., 2014;Forsberg et al., 2005;Månsson et al., 2007).
In order to interrogate these models and how they pertain to the regulation of competence in hematopoiesis, as well as gain a better understanding of the relationships between epigenetic, transcriptomic and functional observations, we mapped global chromatin accessibility using the Assay for Transposase Accessible Chromatin by High Throughput Sequencing (ATAC-seq) (Buenrostro et al., 2013). This assay allows assessment of high resolution, genome-wide chromatin accessibility throughout differentiation programs of rare cells. The dynamics of chromatin accessibility in erythro-megakaryopoiesis (Heuston et al., 2018) and granulocyte/macrophage development (Buenrostro et al., 2018) have been highly informative.
From these studies, the bulk observations gave us insight into the dynamics of lineage commitment during hematopoiesis, while single-cell analysis revealed the heterogeneity of epigenomic states and, therefore, lineage bias in progenitors throughout hematopoiesis. Based on those studies, as well as reports of global chromatin accessibility of embryonic (Bernstein et al., 2006;Bulut-Karslioglu et al., 2018;Gaspar-Maia et al., 2011) andhematopoietic (Cabal-Hierro et al., 2020;Ugarte et al., 2015) stem cells, we hypothesized that HSCs are in a permissive chromatin state where CREs that control fate decisions are primed in HSCs. Here, we tested this hypothesis by performing in-depth ATAC-seq investigation of HSCs and 5 unipotent lineage cell populations representing the five main hematopoietic lineages ( Figure 1B), as defined by previously published phenotypes (Boyer et al., 2011(Boyer et al., , 2019.

Mapping of chromatin accessibility in HSCs and unipotent lineage cells identified a tight association of MkPs to HSCs.
To determine the dynamics of genome accessibility throughout hematopoiesis, we sorted six primary hematopoietic cell types ( Figure 1B) and performed ATAC-seq. We identified 70,731 peaks in HSCs,47,363 peaks in MkPs,38,007 in EPs,30,529 in GMs,70,358 in B cells,and 51,832 in T cells (Table 1). From these peak-lists we combined and filtered the peaks to only the most significant peaks using the chromVAR package (Schep et al., 2017) and identified a total of 84,243 peaks, referred to as the master peak-list throughout the study (Table 1). To assess data quality, we analyzed replicate clustering and cell type relationships of all 6 cell types using principal component analysis and dimensionality reduction as a t-Distributed Stochastic Neighbor Embedding (tSNE) plot (Schep et al., 2017). All biological replicate samples closely associated with each other by tSNE analysis (Figure 1C), as well as by hierarchical clustering using the chromVAR output ( Figure 1D). We observed two primary clusters in Figure 1D: an HSC/MkP cluster and all other cell types. We also observed a distinct lymphoid cell subcluster containing only B and T cells, while GMs and EPs clustered independently. MkPs have the most similar accessibility to HSCs, with the ranking of the other cell types from most to least similar as EPs, GMs, Bs and then Ts. This is consistent with our tSNE analysis (Figure 1C), where HSCs and MkPs closely associated with each other, and with studies that have reported a close relationship of HSCs with the megakaryocyte lineage (Carrelha et al., 2018;Rodriguez-Fraticelli et al., 2018) and that erythropoiesis requires chromatin remodeling for differentiation to occur (Heuston et al., 2018).
Visualization and comparison of ATAC-seq data generated in this study correlated with known expression patterns of cell type-specific genes.
As another assessment of the quality and reproducibility of our ATAC-seq data, we visualized the ATAC-seq signals across promoters of genes with known cell type-specific expression patterns, plus a negative (expressed in none of the cell types) and a positive (expressed in all of the cell types) control, using the Gene Expression Commons (GEXC) expression database (Seita et al., 2012): Gapdh (expressed in all cell types), Fezf2 (not expressed in any cell type), Klf1 (EPs only), Gp6 (MkPs only), Ly6g (GMs only), CD19 (B cells only), and Ccr4 (T cells only) (Figure 2A,B).
Ly6g was not available in GEXC but is a well-known GM-selective gene (Hestdal et al., 1991).
We observed the expected accessibility peaks in each cell type, as well as a minimal signal from cell types without expression of those genes ( Figure 2B). The overall high level of reproducibility between independent sample replicates and clustering strategies ( Figure 1C,D), as well as the expected accessibility in cell type-specific genes (Figure 2), indicated that we had generated high-quality chromatin accessibility maps of these 6 cell types.
HSCs have greater global accessibility and undergo more extensive chromatin remodeling upon lymphoid differentiation.
Using a number of quantitative, but non-sequence-specific assays, we previously reported that chromatin is progressively condensed upon HSC differentiation into unilineage and mature cells (Ugarte et al., 2015). To test whether the ATAC-seq data recapitulated these findings, we quantified the total number of distinct peaks, as well as the cumulative read-counts for all peaks, for each cell type. First, we took each cell type's optimal peak-list from the Irreproducible Discovery Rate (IDR) analysis (Li et al., 2011) and reported the number of peaks. We observed the highest number of peaks in HSCs (Figure 3A), closely followed by B cells. In parallel, we quantified global accessibility by calculating the normalized average signal over the master peaklist for each cell type by generating histograms using HOMER (Heinz et al., 2010). We observed similar ordering compared to the peak number, with HSCs having the highest average signal and B cells the second highest ( Figure 3B). The low signal in EPs is possibly due to widespread transcriptional silencing as the next step towards becoming highly specialized red blood cells and ejection of nuclei . Although these measurements are not completely independent, HSCs displayed both the highest number of peaks and the greatest peak signal.
These results are consistent with our previous findings of progressive chromatin condensation upon HSCs differentiation (Ugarte et al., 2015).

Comparisons of peaks gained and lost as HSCs differentiate into unilineage cells revealed an overall gain of accessibility selectively for B cell differentiation.
To assess the number of peaks that changed upon HSCs differentiation, we took the IDR optimal peak-list for each cell type and performed pair-wise comparisons between HSCs and the five mature/unipotent cell types ( Figure 3C). We quantified the number of peaks gained and lost by the unipotent progenitors/mature cells compared to HSCs (Figure 3C-G). MkPs had the lowest number of peak changes (peaks gained plus lost; Figure 3D), and therefore have the greatest proportion of peaks in common with HSCs. This is primarily driven by the low percentage of peaks gained ( Figure 3E), as opposed to peaks lost ( Figure 3F) upon HSC differentiation into MkPs. In contrast, EPs had the highest percentage of total peaks changed ( Figure 3D) due to the greatest percentage of peaks lost ( Figure 3F). This could be driven by EPs starting to shut down transcription to become highly specialized and eject their nuclei, reflected by the overall low accessibility observed (Figure 3A Exclusively shared peaks between HSCs and unipotent cell types are primarily nonpromoter and are enriched for known cell-type specific transcription factors. We then turned our attention from peaks that were different between HSCs and their progeny to instead focus on elements with shared accessibility. We hypothesized that peaks that are exclusively shared between HSCs and one unipotent cell type contain elements that drive lineage commitment into that cell type. We filtered the peak lists of all 6 cell types against each other using the HOMER mergePeaks.pl tool and annotated the peak lists that each of unipotent lineage cell types exclusively shared with HSCs ( Figure 4A). We quantified the percentage of peaks that each unipotent cell type shared with HSCs ( Figure 4B). Consistent with the clustering profiles ( Figure 1C-D), MkPs had the highest percentage of peaks that were shared exclusively with HSCs. This similarity appeared to be primarily manifested in non-promoter elements: we annotated the exclusively shared peaks and categorized them as promoter or non-promoter peaks ( Figure 4C) and compared the distributions to the annotated peak lists for each cell type assayed ( Table 1). All of the exclusively shared peak lists had significant enrichment (p-value <0.001) of non-promoter peaks compared to the normal distribution of peaks in our dataset. Thus, non-promoter elements are shared between HSCs and their progeny significantly more frequently than promoter elements, especially with MkPs.
To determine what transcription factor binding sites were present within the exclusively shared peaks, we performed motif enrichment using the HOMER package and reported the top 10 results for each cell type, sorted by p-value ( Figure 4D-H). The peaks that HSCs shared with MkPs ( Figure 4D) or EPs ( Figure 4E) were primarily enriched for Gata family transcription factors and their inhibitor TRPS1. Notably, HSC/MkP peaks also had enrichment of ERG and Runx1, which are known drivers of hematopoiesis (Growney et al., 2005;Kruse et al., 2009). For HSC/EPs, Gata1 was the most enriched motif, with the Gata:SCL combination motif and NF-E2 and NFE2L motifs also scoring in the top ten. These factors are all known to be important in red blood cell differentiation, and NF-E2 is known to regulate SCL and Gata2 (Siegwart et al., 2020).
These motifs could be a reason for the overall high number of peaks observed in B cells ( Figure   3A,B), as 44.7% and 46.6% of the shared peaks contained CTCF or CTCFL motifs, respectively. HSC/T cell peaks were enriched for Tcf and Tbx family factors that are known to play a role in T cell development ( Figure 4H). Overall, all five HSC-shared peak lists had enrichment of transcription factors that are known to be important for normal differentiation for each lineage.

Evidence of cis element priming of lineage-specific genes in HSCs.
Previous work on understanding multipotency and developmental competence suggests a model where competence is conferred by transcriptional priming: being competent of transcription factor binding and gene expression, without active expression (Hu et al., 1997). One of the suggested regulators of transcriptional priming are cis-regulatory elements (CREs). This means that CREs that drive lineage fate for all lineages are accessible in HSCs in our permissive fate model and inaccessible in our de novo activation model. We hypothesized that CREs that are exclusively shared between HSCs and a unipotent lineage cell are potential drivers of that lineage cell. We  Figure 5K), and contained 5 out of the top 10 motifs, ZEB1/2, Slug, Ascl2, HEB, and E2A ( Figure   5L). In T cells, the gene Inducible T cell co-stimulator (Icos) is only expressed in T cells ( Figure   5M), a predicted linked CRE is accessible in both T cells and HSCs ( Figure 5N) and contains motifs for CTCF and WT1 ( Figure 5O). Taken together, these examples represent CRE priming in HSCs, along with the corresponding transcription factors that may act on each element to guide HSC fate.

MkPs and HSCs have the most similar accessibility profile.
Here, we compared the genome-wide accessibility by ATAC-seq of the multipotent HSCs and unipotent lineage cell types (EPs, MkPs, GMs, B, and T cells). Through hierarchical clustering analysis, we observed erythromyeloid and lymphoid relationships that are consistent with the classical model of hematopoiesis ( Figure 1D) (Boyer et al., 2011;Bryder et al., 2006;Laurenti and Göttgens, 2018;Seita and Weissman, 2010). By both PCA and hierarchical clustering, we observed that MkPs were the most similar to HSCs based on their accessibility profiles ( Figure   1). This relationship is reflected in a high level of overlap of peaks, as MkPs have the fewest peaks gained or lost from HSCs compared to the other cell types (Figure 3) and have the largest percentage of peaks exclusively shared with HSCs ( Figure 4B). These findings are in agreement with recent clonal studies of hematopoiesis that reported a megakaryocyte lineage bias of HSCs (Carrelha et al., 2018;Rodriguez-Fraticelli et al., 2018). According to hierarchal clustering, EPs had the second closest association to HSCs ( Figure 1D) possibly supporting erythropoiesis as the default fate for hematopoiesis (Boyer et al., 2019) under conditions where chromatin remodeling silences megakaryocyte driver elements (Heuston et al., 2018). On the other end of the spectrum, the least similar cell types to HSCs were the lymphoid cell types ( Figure 1D). This greater difference was primarily due to a high proportion of peaks gained ( Figure 3E) rather than lost ( Figure 3F) upon differentiation from HSCs, leading to a greater ratio of peaks gained:lost for lymphoid cells than for erythromyeloid lineages ( Figure 3G).

Evidence of multilineage priming in HSCs.
The priming of genes for transcription likely initiates within CREs, which can then drive the activation of promoter targets. These enhancers can act as drivers of lineage fate (Wang et al., 2015) and their accessibility is a putative regulator of competence in stem cells. We made the assumption that peaks that are exclusively shared between HSCs and the unipotent lineage cells contain CREs that are specific for driving differentiation into that lineage. We observed that the majority of exclusively shared peaks were non-promoter peaks ( Figure 4B) and were enriched for binding motifs of transcription factors known to be important for differentiation into each lineage ( Figure 4D-H). By using the GREAT tool, we made predictions for the target genes for the many CREs that were present in these exclusive lists. The examples shown in Figure 5 provide evidence that multi-lineage priming exists in HSCs.

Both permissive and de novo epigenetic mechanisms influence hematopoiesis.
Analogous to other stem cell systems, multipotent HSCs with the competence to differentiate into diverse cell types reside at the top of the blood cell hierarchy. We tested two potential models of the mechanism of multipotency, the permissive fate and de novo activation ( Figure 1A). We found evidence for both. Supporting the permissive fate model are the observations that HSCs had the highest global accessibility ( Figure 3A/B), that peaks were lost in every unipotent cell type from HSCs ( Figure 3F), that every unipotent cell type shared some peaks exclusively with HSCs ( Figure 4B), and that evidence of multilineage priming of CREs were found in HSCs ( Figure 5).
The de novo activation model was supported by the observation that new peaks were gained during differentiation into all five lineages (Figure 3E), and previous studies reporting progressive upregulation of lineage-specific genes as HSCs transition into progenitors (Forsberg et al., 2005;Terskikh et al., 2003). Thus, both mechanisms likely influence hematopoietic fate decisions.
Interestingly, we found evidence that the balance between the two models varies between lineages. For example, B cells, and to a lesser extent T cells, had a higher proportion of peaks gained than lost compared to erythromyeloid lineages ( Figure 3G). This may indicate that the megakaryocyte/erythroid lineage is in a more primed state in HSCs, whereas lymphopoiesis requires more extensive chromatin remodeling to both prime lymphoid CREs not accessible in HSCs and simultaneously shut down the megakaryocyte/erythrocyte trajectory. The cell output and kinetics from in vivo lineage tracing and reconstitution assays support these conclusions (Carrelha et al., 2018;Rodriguez-Fraticelli et al., 2018); (Boyer et al., 2011(Boyer et al., , 2012(Boyer et al., , 2019Yamamoto et al., 2013). Our identification of specific, putative regulatory CREs will enable functional testing of these elements.

ATAC-seq
ATAC-seq was performed as previously described (Buenrostro et al., 2013). Briefly, cells were collected after sorting into microcentrifuge tubes containing staining media (1xDPBS,1mM EDTA with 5% serum). They were centrifuged at 500xg for 5 minutes at 4˚C to pellet the cells. The supernatant was aspirated, and the cells were washed with ice-cold 1xDPBS. Cells were centrifuged and the supernatant was discarded. Cells were immediately resuspended in ice-cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2 and 0.1% IGEPAL CA-630) and centrifuged at 500xg for 10 minutes. The supernatant was aspirated, and pellets were resuspended in transposase reaction mix (25µL 2xTD Buffer, 2.5µL transposase (Illumina), and 22.5µL nuclease free water). The transposition reaction was carried out at 37˚C for 30 minutes at 600rpm in a shaking thermomixer (Eppendorf). Immediately after completion of the transposition reaction, the samples were purified using the MinElute Reaction Clean up kit (Qiagen) and eluted into 10 µL of EB. Samples were stored at -20˚C until PCR amplification step. PCR amplification was performed as previously described (Buenrostro et al., 2013) using custom Nextera primers.
After initial amplification, a portion of the samples were run on qPCR (ViiA7 Applied Biosystems) to determine the additional number of cycles needed for each library. The libraries were purified using the MinElute Reaction Clean up kit (Qiagen), eluted into 20 µL EB and then size selected using AmpureXP(Beckman-Coulter) beads at a ratio of 1

Data processing
Demultiplexed sequencing data was processed using the ENCODE ATAC-seq pipeline version Peak filtering, hierarchical clustering, and tSNE plot production was performed using the chromVAR package (https://github.com/GreenleafLab/chromVAR). First, the optimal peak-list from the IDR output for each cell type was concatenated and sorted, then used as the peak input for chromVAR. The blacklist filtered bam files for reach replicate was used as input along with the sorted peak file. The fragment counts in each peak for each replicate and GC bias was calculated, and then the peaks were filtered using filterPeaks function with the default parameters and nonoverlapping=TRUE. The master peak-list was extracted at this point, which contained 84,243 peaks, and used throughout the study. The deviations were calculated using every peak, and the tSNE and correlation functions were also performed using the deviations output and the default parameters.
Annotation of peaks, generation of histogram plot, merging of peaks, and motif enrichment was performed by HOMER (http://homer.ucsd.edu/homer/). Peaks were annotated using the annotatePeaks.pl function with the mm10 assembly and default parameters. Histogram was created by first shifting the bam files using DeepTools alignmentSieve.py with the flag -ATACshift.
Next, tag directories were made using the Tn5 shifted bam files using HOMER makeTagDirectory.
The histogram was made using the annotatePeaks.pl function with the default settings and the flags: -size -500,500 and -hist 5. Peak lists were compared using the mergePeaks.pl function with default settings and the flags -d given, -venn, and for the unique peak lists -prefix. Motif enrichment was performed using the findMotifsGenome.pl package with default parameters using the flag -size given and custom background peaks, which consisted of the combination of all the peaklists for the cell types not being analyzed. Instances of motifs in non-promoter peaks were found by using the annotatePeaks.pl function with the -m flag, using custom made motif files for each cell type containing the top 10 enriched motifs found.
The GREAT tool (http://great.stanford.edu/public/html/) was used to annotate nonpromoter peaks to target genes. The peak lists were reduced to BED4 files from the HOMER annotations output and used as input. The whole mm10 genome was used as the background regions, and the association rule settings were set as Basal plus extension, proximal window 2kb

B)
A cis-element predicted to be associated with F2rl2 by GREAT was accessible in both MkPs and HSCs, but not in any other unipotent cell type.
C) The F2rl2 CRE contained the transcription factor binding motifs for 9 out of the top 10 enriched motifs in MkPs. The only motif not present is Runx1.

D)
GEXC expression data reported expression of Pyruvate kinase liver and red blood cell (Pklr) in EPs, and not any other cell type.

E)
A cis-element predicted to be associated with Pklr by GREAT was accessible in both EPs and HSCs, but not in any other unipotent cell type.