Vertebrate GAGA factor associated insulator elements demarcate homeotic genes in the HOX clusters

Background Hox genes impart segment identity to body structures along the anterior-posterior axis and are crucial for the proper development of all organisms. Multiple regulatory elements, best defined in Drosophila melanogaster, ensure that Hox expression patterns follow the spatial and temporal colinearity reflected in their tight genomic organization. However, the precise mechanisms that regulate colinear patterns of Hox gene expression remain unclear, especially in higher vertebrates where it is not fully determined how the distinct activation domains of the tightly clustered Hox genes are defined independently of each other. Here, we report the identification of a large number of novel cis-elements at mammalian Hox clusters that can help in regulating their precise expression pattern. Results We have identified DNA elements at all four murine Hox clusters that show poor association with histone H3 in chromatin immunoprecipitation (ChIP)-chip tiling arrays. The majority of these elements lie in the intergenic regions segregating adjacent Hox genes; we demonstrate that they possess efficient enhancer-blocking activity in mammalian cells. Further, we find that these histone-free intergenic regions bear GA repeat motifs and associate with the vertebrate homolog of the GAGA binding boundary factor. This suggests that they can act as GAGA factor-dependent chromatin boundaries that create independent domains, insulating each Hox gene from the influence of neighboring regulatory elements. Conclusions Our results reveal a large number of potential regulatory elements throughout the murine Hox clusters. We further demarcate the precise location of several novel cis-elements bearing chromatin boundary activity that appear to segregate successive Hox genes. This reflects a pattern reminiscent of the organization of homeotic genes in Drosophila, where such regulatory elements have been characterized. Our findings thus provide new insights into the regulatory processes and evolutionarily conserved epigenetic mechanisms that control homeotic gene expression.


Background
Hox genes specify segment identity during development: the unique combination of their expression pattern provides correct positional identity along the anteriorposterior (AP) body axis. The distinct organization of the Hox genes was first defined in Drosophila melanogaster [1] and such features as their structure, function, clustered organization, and spatial colinearity of expression (their order along the chromosome corresponds to their order of expression along the AP embryonic axis) are highly conserved in all metazoans [2]. Specific features of the vertebrate Hox complexes, however, are distinct from those seen in invertebrates. Through genome duplication events, vertebrates have at least four Hox clusters that are more organized, with all the genes transcribed in the same orientation and arranged more compactly than those in lower organisms [3]. The order of genes within the complexes, though, continues to be colinear, with their domain of expression along the AP axis: this shows remarkable evolutionary constraint to maintain the clustered organization and precise segment-specific expression patterns of the Hox genes. Their temporal order of activation, however, shows a difference; in Drosophila, the expression pattern of all the Hox genes is set simultaneously, while mammals show temporal colinearity by expressing their 3′ Hox genes first and then the more 5′ Hox genes sequentially from anterior to posterior during development [4,5]. These observations suggest that Hox gene regulation is highly dependent on genomic organization. It remains unclear how the colinear pattern of Hox gene expression in mammals is governed to provide discrete segment identity during development. In particular, it is not fully determined how the distinct activation domains of successive Hox genes are defined to maintain this spatial and temporal colinearity of expression. Mouse transgenes bearing regulatory regions of Hox genes faithfully recapitulate their expression domains [6,7], indicating that elements that define the domains of Hox gene expression are probably located in close proximity to the genes themselves, interspersed between coding regions within the conserved clusters. This arrangement may also allow enhancer sharing and promoter competition between the vertebrate Hox genes, providing an additional layer of regulatory control [8]. However, the complex interplay of Hox regulatory elements has not been clearly mapped in organisms other than Drosophila, where distinct regulatory domains control gene expression; most importantly, boundary elements flank each regulatory domain and define its limits in the bithorax complex [9]. Chromatin boundaries or insulator elements are implicated in regulating genome-wide chromatin-mediated effects and are crucial for blocking inappropriate gene expression to provide functional autonomy of chromatin domains [10,11]. Homeotic gene regulation is thus highly dependent on genomic organization and is maintained by progressive transcriptional activation from a repressed state governed by the trithorax (trxG) and polycomb (PcG) groups of proteins [12]. Epigenetic mechanisms therefore play a crucial role in facilitating chromatin reorganization in an ordered manner to provide accessibility control at Hox clusters [13].
Here, we have explored the chromatin regulatory features across all four murine Hox clusters using custom-designed high-resolution chromatin immunoprecipitation (ChIP)-on-chip tiling arrays to reveal the presence of novel histone H3-free regions as potential regulatory elements. We show that most of these histonedepleted regions are associated with the vertebrate GAGA factor. Further, almost all intergenic regions at the Hox clusters show the presence of such histone-free regions that bear significant enhancer-blocking activity in human cells. These findings establish an association of the GAGA factor at specific intergenic chromatin structures that act as regulatory elements in mammalian Hox clusters. The observation that some of these regions can function as chromatin boundaries indicates a conserved mechanism of Hox gene expression regulated by distinct chromatin domains of each gene.

Results and discussion
Identification of histone-free regions at murine Hox loci To obtain a complete picture of the epigenetic organization of the four murine Hox clusters, we designed a ChIP-on-chip tiling array including all 39 Hox genes along with their intergenic as well as upstream and downstream flanking regions. Synchronized G0 cells from the mouse C2C12 cell line were used to obtain a homogenous assay system for chromatin analysis, minimizing possible effects of cell type or cell cycle distribution and potential variations in the epigenetic state. Chromatin immunoprecipitation (ChIP) experiments with histone H3 antibodies were then performed for hybridization to the custom-designed tiling arrays.
Strikingly, ChIP-on-chip analysis revealed a consistent pattern of histone depletion at intergenic intervals across all arrays. The H3 'pan' antibody against the carboxy terminus of histone H3 (an invariant region that escapes post-translational modifications) recognizes all forms of histone H3 and its modifications. We found that the binding of this antibody was depleted at many regions within the Hox clusters in a defined pattern, as outlined for 150 kb of each of the Hox clusters in Figure 1. These regions are likely to be devoid of histone H3, in contrast with the enriched regions in the pan histone H3 array that mark histone presence (Table 1), and were designated as histone-free regions (HFRs). Further, the HFRs showed consistent lack of enrichment with two additional modified histone antibodies specific for H3K4me3 and H3K27me3 (Additional file 1: Figure S1A). The four clusters displayed distinctly different patterns of peak enrichment with these two histone modification marks ( Figure S1B) [14]. Antibody binding across the arrays was, however, used to identify the 'unenriched regions' , marked by probes that showed consistently negative or poor enrichment values. Over 66% of the HFRs in the pan H3 array were found to be unenriched in all the three arrays (Additional file 1: Table S1). Thus, successive peaks of antibody enrichment were separated from each other by 'troughs' or 'gaps' that were not enriched for any of the antibodies. Most of the HFRs could be clearly visualized directly from the probe binding data by the presence of these large gaps in antibody enrichment (boxed regions in Figure 1 and Additional file 1: Figure S1A) suggesting an unequal histone organization throughout the Hox clusters, with the troughs of low histone H3 enrichment indicating regions of low occupancy.
Nucleosome organization governs the accessibility of chromatin to regulatory complexes, increasing the binding of nonhistone proteins to target sites. Genome-wide chromatin studies have established the presence of regions with altered nucleosome occupancy, which often coincide with known regulatory elements, such as gene promoters, enhancers and transcription start and termination sites [15]. The distribution of core histone H3 has been used previously as a readout of nucleosome distribution in yeast [16] and other organisms, including mouse and human [17,18]. To identify all genomic regions associated with poor histone H3 presence in the context of their location within our arrays, we used a bioinformatics approach, considering the individual histone enrichment at neighboring probes at high resolution to delineate the extent of all possible HFRs at the four clusters (see Methods). Using custom scripts, we could identify many Figure 1 ChIP-on-chip array reveals histone-free regions in the murine Hox clusters. Screenshots from the histone H3 array data visualized using the UCSC genome browser [43] showing normalized log ratios (NLRs) of probes (y axis) at the Hox clusters. Large stretches of unenriched probes showing negative NLR values are highlighted by boxed regions. The genomic scale bar is indicated at the top. A map of all the HFRs (green boxes) within each cluster identified by bioinformatics analysis is presented at the bottom in the context of the genic regions (blue boxes with transcriptional orientation indicated; the large intronic regions of Hoxa3 and Hoxd3 genes have been omitted for clarity). Red boxes correspond to locations of noncoding transcripts (multiple transcripts mapping to same location are clubbed for clarity) and dark green boxes correspond to CpG islands. more regions from our ChIP-on-chip array that showed poor enrichment and were likely to be part of HFRs at these clusters. The location of all the HFRs thus identified is indicated in Table 1 and in the HFR maps provided below each cluster in the context of genic regions in Figure 1 and Additional file 1: Figure S1A (green boxes). The HFRs were most frequent across the HoxA, HoxC, and HoxD clusters while the HoxB cluster showed the lowest occurrence of such regions. The HoxB cluster is different from the other Hox clusters in that the Hoxb13 gene is separated from the rest of the cluster by~70 kb of DNA, in contrast with the other homeotic genes, which are closely spaced in all four clusters. Although the HoxB cluster maintains colinear expression of all its genes along the A-P body axis, the distant location of Hoxb13 is believed to contribute to its loss of expression along the secondary body axes, presumably owing to the absence of cis-regulatory features in the large intervening region, which also contains highly repetitive DNA sequences [19]. This unique feature of the HoxB cluster could explain the absence of HFRs in this part of the complex. The reason behind the low frequency of HFRs in the cluster as a whole is not clear, although it may suggest a relatively low regulatory complexity for this cluster. Interestingly, we observed that many of the HFRs were positioned intergenically. Altogether, 93 HFRs were identified by bioinformatics analysis at the Hox clusters (including the 10 kb flanking upstream and downstream regions) of which only 9 were genic (Table 1). Of the intergenic HFRs, 9 were within 1 kb around a gene start and could be considered to overlap with putative promoter regions while 11 were located at the 3′ end of Hox coding regions. A large number of the intergenic HFRs (64 HFRs) were located at intermediate positions between successive Hox genes or downstream to them. Apart from the Hox coding regions, we also checked the positions of intergenic noncoding transcripts as well as all CpG islands known at the Hox clusters. While 16 of the intergenic HFRs were found within noncoding transcribed regions, the remainder showed no overlap with any of these known intergenic features (Figure 1), suggesting that many of these HFRs probably represent sites for other chromatin regulatory activities. In Drosophila, it has been shown that the binding of promoter-proximal stalling factors at inactive Hox genes helps organize insulator loop domains [20]. The finding of histone-depleted regions around mammalian Hox genes, therefore, suggests a potential for the binding of regulatory proteins at these sites to bring about such chromatin reorganization activities.

Histone-free regions at the Hox clusters function as enhancer blockers
Histone depletion is a feature of genomic sites involved in regulatory activities as it allows for binding of nonhistone target proteins. Given the distinct spatiotemporal activation of mammalian Hox genes during development, we hypothesized that the intergenic HFRs identified in our array represent sites for regulatory elements that could help define functional domains and, thereby, the precise expression patterns of these genes. Hox genes are organized in a cluster, yet they are uniquely regulated in a spatially and temporally restricted manner such that each gene can contribute to a unique segment identity. This necessitates the presence of functionally independent domains for each gene within the cluster defined by chromatin domain boundaries that can ensure unique expression patterns in each segment. In vertebrates, such elements have not yet been clearly defined at the Hox clusters.
To determine whether any of the newly identified HFRs marked potential boundary elements at the murine Hox clusters, we decided to test their enhancerblocking ability. We selected candidate HFRs that were located in intergenic regions and showed no overlap with any of the known regulatory features at the clusters as described. The average size of the test elements was about 1 kb; all of the test elements showed negligible enrichment with the histone H3 antibody. The HFRs from all four clusters were tested for enhancer-blocking activity in a human cell line using the number of neomycin resistant colonies formed as a readout of enhancer activity. A boundary element cloned between the locus control region (LCR) enhancer and the γ-neo gene providing antibiotic resistance blocks promoter activation (Figure 2 inset), thereby rendering the cells sensitive to G418 and decreasing the number of colonies obtained in this assay. Earlier reports have established that the presence of lambda insert between LCR and γ-neo does not have any effect on the expression of the neomycin resistance gene, ruling out any distance effect [21]. We used the vector alone as a negative control to indicate 100% survival of colonies in the absence of any blocking of the γ-neo gene and the well-known chicken β-globin boundary [21] as a positive control against which to compare the number of colonies obtained with the test regions from the Hox clusters. We tested 24 intergenic HFRs and, as a control for specificity, we also included 6 regions (from within Hox gene bodies) that showed high enrichment with the core histone H3 antibody. Strikingly, we observed that there was a significant decrease (P < 0.0001) in the number of G-418 resistant colonies when cells were transfected with the target HFRs ( Figure 2). The control chicken β-globin boundary element showed 53% survival of colonies in the assay. A fragment from the Hoxd13-Evx2 intergenic region that we had previously characterized as a potential boundary element [22] was also as efficient in enhancer blocking as the chicken β-globin element in our assay (52% survival). Over this background, we found that eight of the HFRs tested showed substantially fewer colonies (only 38% to 50% survival) than those obtained with both control boundaries in the same assay, indicating extremely efficient enhancer-blocking activity associated with these intergenic regions and suggesting their potential to act as strong boundary elements. Further, 12 more of the tested HFRs also displayed enhancer-blocking activity comparable to the control boundaries (50% to 65% survival). Only 4 regions out of the 24 intergenic fragments tested showed survival >65% (but still <75%). Thus, 20/24 intergenic HFRs tested behaved as efficient enhancer blockers in this assay, suggesting that these histone-free regions could act as boundary elements and demonstrating a correlation between the presence of HFRs and their role in demarcating Hox gene domains. In contrast, in the case of the six H3-enriched fragments derived from the Hox genic regions, consistently high survival of colonies could be observed (85% to 123% survival; red bars in Figure 2), similar to or greater than that seen with the vector control, indicating an absence of enhancer-blocking potential. Taken together, these results clearly show that the intergenic HFRs from the murine Hox clusters selectively block the LCR enhancer from acting on γ-neo reporter gene, indicating that these sequences can function as boundary elements. The disruption of histones exposes cis-regulatory elements and their binding motifs in contrast with the surrounding packed DNA. Boundary elements can thus be expected to be associated with a modified chromatin state and indeed, the scs and scs′ elements at the Hsp70 locus in Drosophila were originally characterized by their specialized chromatin structure [23]. Reduced nucleosome occupancy corresponding to histone replacement has since been reported at functional boundaries within the Drosophila bithorax complex and the fact that these are correlated with PcG and trxG protein binding suggests an increased accessibility of the PREs (polycomb response elements) and boundary elements [24]. In this context, our results suggest that mammalian Hox clusters also bear such regulatory elements and this study provides a large-scale identification of such potential elements marked by their modified histone occupancy pattern. Not all the elements tested displayed strong boundary potential and it is possible that the enhancer blocking observed in some cases may be the manifestation of some other activity in their native context, as there are many ways that enhancer-promoter interactions can be disrupted [10,25]. Based on the genomic context and the fact that some of the HFRs bear strong enhancer-blocking activity, we suggest the presence of chromatin domain boundaries at the intergenic locations identified here. This finding indicates a conserved mechanism to control the initiation and maintenance of collinear Hox gene expression in mammals.
Intergenic HFRs at the Hox clusters are associated with the vertebrate GAGA binding factor The observation that a large number of the HFRs act as enhancer blockers in the in vitro colony formation assay suggests their possible association with key regulatory factors at the murine Hox loci. We therefore subjected all the identified HFRs to sequence analysis and found multiple sites for the GAGA binding factor (GAF in Drosophila) at the histone-free regions, with the GAGAG motif appearing as the most frequent significant hit with a motif P < 0.0005 (Figure 3 inset). In all, 44 intergenic HFR sequences were found to contain putative GAF binding sites, many with multiple instances or repetitions of the GAGAG motif ( Figure 3A). A total of 113 GAF motifs could be identified throughout the HFRs at the HoxA, HoxC, and HoxD clusters, including 10 kb upstream and downstream flanking regions (Additional file 1: Table S2). Interestingly, the HoxB cluster showed a lack of association of its few HFRs with the predicted GAF motif. This suggests the existence of other non-GAF-dependent mechanisms, especially in the regulation of the HoxB cluster. Alterations in histone modifications and chromatin decondensation followed by a stepwise looping out of the active genes are implicated in temporal HoxB gene expression [26] but the factors involved in this reorganization are unknown.
GAF is a versatile DNA binding factor, known to be associated with nuclease hypersensitive sites, that binds to GAGAG stretches to perform complex multifunctional roles in genome regulation [27]. Although identified as a trxG component in the context of Hox regulation at active chromatin within regulatory elements in Drosophila [28], GAF was subsequently shown to be associated with enhancer-blocking activity as well and GAF-dependent boundaries mediate the activity of multiple homeotic enhancers in Drosophila [29,30]. To confirm whether the intergenic HFR sequences were associated with GAGA factor in mouse cells, we then performed ChIP-qPCR analysis using an antibody directed against Th-POK (the vertebrate homolog of dGAF). Enrichment was assayed using multiple primers designed across 18 HFRs, including 13 that had shown high enhancer-blocking activity, as described in Figure 2. The known Th-POK binding boundary region between Evx2-Hoxd13 characterized previously ( [22]; control in Figure 3B) was used as a reference and showed robust association with the Th-POK antibody, much greater than that observed for the IgG control in the ChIP assay. We found that most of the HFRs that had shown enhancer-blocking activity were also similarly highly enriched for Th-POK binding ( Figure 3B). Five additional intergenic HFRs that had not been tested for enhancer-blocking activity but showed low histone occupancy also showed significant Th-POK association, indicating that these are also likely to bear boundary potential. The high degree of enrichment indicates robust association with GAGA factor at these regions, correlating well with the presence of multiple GAF binding sites at the HFRs. On the other hand, six regions with high histone occupancy in the histone H3 array (H3-enriched or H3-positive regions), positioned within the Hox gene bodies, did not show similarly robust association with the Th-POK antibody ( Figure 3B, middle).
CTCF (CCCTC binding factor) is the best studied vertebrate boundary factor (a large number of CTCF binding sites have been found in the mammalian genome [31,32]) and also plays many roles in genome organization and regulation. At the HoxA cluster, CTCF has been shown to maintain higher-order architecture via barrier activity, and CTCF in conjunction with cohesin organizes chromosome looping to segregate inactive HoxA gene domains [33]. We therefore decided to check for CTCF binding site association with the HFRs at the Hox clusters. However, we could not find any evidence of preferential association of CTCF motifs within the intergenic HFRs by sequence analysis, as seen for the GAF binding sites. A comparison with CTCF binding sites identified by the ENCODE project also showed minimal overlap with HFRs (Additional file 1: Figure S2). We further checked for CTCF binding at 14 intergenic HFRs by ChIP-qPCR analysis. In contrast with the clear association of Th-POK, CTCF was not significantly enriched in comparison with the control IgG antibody at most of the boundary or intergenic HFRs tested ( Figure 3B, right). Thus, although CTCF is known to be the commonly associated vertebrate insulator factor, it is unlikely to be the mediator of the regulatory activity associated with these newly described HFRs in the Hox loci. These results are consistent with recent work demonstrating that CTCF does not act as a boundary element at the HoxD cluster [34]. CTCF can organize genome regulation through diverse activities, such as transcriptional activation and repression [35]. It is possible that CTCF binds to other sites not defined by low levels of H3-enrichment as assayed in this study, where it may play direct roles in the transcriptional regulation and chromatin compaction of the Hox genes [36].

HFRs overlap with the sites of DNaseI hypersensitivity at the Hox clusters
The association of HFRs observed here appears to be specifically with the vertebrate GAGA factor and could possibly be the outcome of the nucleosome-reorganizing potential of GAF and not solely a feature of boundary % input B A Figure 3 Vertebrate GAGA factor (Th-POK) is associated with intergenic HFRs. Sequence analysis of HFRs from Hox clusters reveals the presence of multiple GAF binding sites. (A) Schematic map (drawn to scale) depicting the location of all the GAGA motifs (vertical red lines). A few motifs appearing close together are merged. Blue boxes denote the Hox genes with transcriptional orientation (the large intronic regions of Hoxa3 and Hoxd3 genes have been omitted for clarity); green boxes indicate the HFRs. (B) Binding of Th-POK (vertebrate GAF) and CTCF across the Hox clusters was assessed by real time ChIP-qPCR assays using a panel of primers designed at the HFRs as well as some genic regions enriched for histone H3. The known Th-POK binding region at the Evx2-Hoxd13 boundary [22] was used as a reference (control). ChIP with Th-POK antibody (left) showed robust binding measured as percentage of input at the control and intergenic HFRs (blue bars) enriched above the background level of the nonspecific immunoglobulin G (IgG, red bars). Genic regions did not show such a high enrichment profile (middle). ChIP with the CTCF antibody (right) did not show any enrichment over the background at the HFRs. Results represent mean ± standard error from three independent experiments with each qPCR assay performed in triplicate. Inset depicts the motif discovered from the HFRs (left) and matched to the GAGA binding site (right) by MEME analysis.
activity. In Drosophila, many cis-regulatory elements including boundaries within the homeotic complex are bound by GAF, and evidence suggests that altering the nucleosome occupancy might be a feature of such binding [37]. To determine whether HFRs marked sites of histone displacement, we checked for the presence of DNaseI hypersensitive (HS) sites using published data from ENCODE. A large number of the total HFRs overlap with a range of HS sites identified in mesodermal tissue and skeletal muscle (tissue of origin) as well as embryonic stem cells (Additional file 1: Figure S3). 82% of the HFRs identified in this work mapped to an HS site in at least one of the cell types examined while 27 HFRs showed an overlap with HS site consistently across all the cells (Additional file 1: Table S3). DNaseI hypersensitivity serves as a measure of chromatin accessibility and is therefore a reliable marker for sites of histone disruption and associated regulatory activity. The presence of an HS site could be a dynamic feature when it marks regulation triggered by a transcriptional event. Consistent association with nuclease hypersensitivity across cell types suggests higher-order chromatin structure, as in the case of boundary elements, such as scs and scs′ in Drosophila, which remain associated with HS sites regardless of a heat-shock event triggering the activation of the locus [23].
We hypothesize that the vertebrate GAF-associated intergenic HFRs described here may be important in maintaining the local chromatin conformation of Hox genes. It remains to be determined if the histone depletion creates accessibility at these sites for enhancer-blocking function or if the GAF binding elements actively serve as anchors for directing the basic chromatin structure by histone disruption at the Hox complexes. dGAF plays a key role in directing chromatin architecture and driving nucleosome reorganization by recruiting remodeling complexes; its mutation has been shown to effect both histone replacement and associated boundary function at Hox complexes in Drosophila [38]. The presence of multiple GAF sites in the intergenic HFRs suggests the possibility of the vertebrate GAGA factor directing nucleosome reorganization and allowing binding of regulatory proteins at mammalian Hox clusters. Since HFRs can provide sites for a multitude of target proteins and GAF can bind to various proteins to form regulatory complexes, looking for interacting partners at these sites will help identify other regulatory features at the HFRs. Taken together, our findings suggest a correlation between GAGA factor association at the histone-depleted regions and chromatin accessibility, providing functional significance to the presence of HFRs in the intergenic regions of mammalian Hox clusters.

Conclusions
This study used an epigenetic approach to identify a novel set of DNA elements at murine Hox clusters potentially involved in higher-order chromatin regulatory mechanisms. In Drosophila, multiple elements, such as enhancers, insulators, and PREs, form regulatory units that determine and maintain the segment-specific expression pattern of the associated homeotic genes. In mammals, the regulatory elements driving colinear activation and maintenance of Hox expression have not been delineated. In this context, GAGA factor associated histone-depleted regions identified in the intergenic stretches between murine Hox genes suggest sites of potential regulatory function that could help demarcate their domains of expression. We have previously characterized a GA-rich intergenic tract between the Evx2 and Hoxd13 genes conserved among mouse, human, and zebrafish and demonstrated its functional conservation as an enhancer blocker in both transgenic flies and cultured human cells [22]. The region encompassing the GA site has also been shown to act as a chromatin domain boundary element at the murine HoxD locus [39]; we have further shown that specifically mutating the GAGA binding sequence abolishes its insulator function [22]. We have previously established Th-POK as the vertebrate homolog of dGAF and indicated its potential to bind and regulate Hox clusters [40]. In this work, we have described a marked association of HFRs across the murine Hox clusters with GAF recognition sites and Th-POK binding. These results indicate a strong correlation of HFRs associated with Th-POK binding as a general feature of mammalian Hox gene domain regulation and provide mechanistic insights into the possible role of the mammalian GAGA factor in nucleosome reorganization at the Hox clusters, thereby setting the stage for the binding of regulatory proteins to organize chromatin regulatory activities, including boundaries.
Finally, sequence comparison might not directly identify homologous regulatory elements of the Hox genes across species but delineating the conserved organizational features of the Hox clusters can offer remarkable insights into their regulation. In this study, we have been able to identify over 90 such novel regions in the Hox clusters as HFRs and demonstrate the potential of many of the intergenic HFRs to function as chromatin regulatory elements. Our work indicates a genome-wide association for such structures and demonstrates that a lack of enrichment of histone H3 at specific genomic sites in ChIP-on-chip binding assays can serve as a marker for the identification of novel regulatory elements in a context-specific manner.

Cell culture
C2C12 skeletal muscle myoblasts were cultured as described previously [41]. Undifferentiated myoblasts were maintained in growth medium containing DMEM with 20% FBS and synchronized by subjecting subconfluent cells to suspension culture at a density of 10 5 cells/ml in DMEM containing 1.3% methyl cellulose, 20% FBS, 10 mM HEPES, and 1 × Penicillin-Streptomycin. Suspended cells were harvested after 48 h, during which time 98% of cells were arrested at the quiescent G0 stage [42], by dilution with 1 × PBS and centrifugation at 2500 rpm for 30 min at room temperature. Cells were washed well with 1 × PBS at room temperature to remove all traces of methyl cellulose and pelleted for chromatin isolation.

Chromatin immunoprecipitation
Chromatin was crosslinked and isolated from 10 7 synchronized cells. Cells were harvested as described and fixed using 1% formaldehyde (Fisher Scientific) in growth medium for 10 minutes at 37°C and quenched with 0.125 M Glycine (Sigma). Fixed cells were washed well with 1 × PBS containing protease inhibitors at 4°C and resuspended in 2 ml lysis buffer supplemented with PMSF, DTT, and protease inhibitor cocktail (Roche). Following 15 min incubation on ice, the sample was sonicated using Bioruptor (Diagnode) to obtain fragments of average size 200 bp to 600 bp. The sonicated chromatin was divided into 200 μl aliquots for chromatin immunoprecipitation (ChIP) using the ChIP assay kit (Upstate, #17-295) according to the manufacturer's protocol. ChIP was performed on precleared chromatin from 10 6 cells using rabbit polyclonal antibodies against core histone H3 (Abcam, #ab1791), H3K4me3 (Millipore, #07-473), H3K27me3 (Millipore, #07-449), Th-POK (Abcam, #ab20985), or CTCF (Abcam, #ab70303). Unenriched input fraction was retained as control in each case. ChIP and input samples were subjected to array hybridization or real time qPCR assays to determine enrichment.

Tiling array design and hybridization
A custom mouse genome microarray chip representing 1.1 Mb of the mouse genome consisting of the four Hox clusters was designed with 60-mer probes (approximately 16,200 probes) tiled continuously with a 10 bp overlap between consecutive probes for high resolution (Agilent Technologies). Repeat masking was used to avoid repeat sequences and other probes with cross hybridizing potential. Dual-color microarray hybridization experiments were carried out according to the manufacturer's protocol. Briefly, samples recovered from the enriched fraction and unenriched whole cell extracts (input DNA) from two parallel experiments were independently pooled to obtain optimal yields for direct array hybridization, thus minimizing bias introduced during ligation-mediated PCR amplification of DNA fragments. The pooled enriched and input fractions were directly labeled with Cy5 and Cy3 dyes, respectively, using an Agilent Genomic DNA labeling kit (5190-0449) and hybridized onto the mouse custom arrays at 65°C for 40 h followed by washing. Scanned microarrays were submitted for background subtraction and data extraction to Agilent's Feature Extraction. Normalized enrichment values for the probes were identified using DNA Analytics software. Probe data were normalized using default blanks subtraction and intra-array dye-bias median normalization and P values were assigned to groups of neighboring probes using the Whitehead error model.
The ChIP-on-chip signals were normalized with input DNA signals hybridized to the same arrays and the NLRs of the probes thus obtained were visualized on the Mouse NCBI37/mm9 Assembly in the UCSC genome browser [43]. Additional file 2 includes the NLR values of all the probes from the Hox clusters. The tiling array data have been submitted to the GEO database ([44]; GEO:GSE42941).

Bioinformatic analysis
For data analysis and definition of all the histone unenriched gaps (HFRs), NLR values of the probes within the Hox clusters from the histone H3 tiling array were used to identify unenriched genomic stretches using an inhouse PERL script. The extent of each target region was determined based on a defined cut-off for the number of consecutive probes that showed a negative NLR value with the histone H3 antibody. Briefly, the probe dataset from each cluster was divided into groups of five contiguous probes and each was classified as a 'low enrichment' group if at least three probes in the group had an NLR <0. Continuous blocks of low enrichment groups were extended until two or more consecutive high enrichment groups were encountered or if a genomic distance >200 bp was encountered between successive probes. These poorly enriched genomic stretches were classified as histone H3-free regions or HFRs and subjected to further analysis for ChIP-qPCR (Primers are provided in Additional file 1: Table S4) and enhancer-blocking assays.
Genomic sequences of HFR stretches from each of the clusters were extracted from the mouse reference assembly MGSCv37-C57BL/6J of NCBI build 37.2 and subjected to motif search analysis using the MEME Suite of tools [45]. The HFRs were mapped in the context of the murine Hox genes and all noncoding transcripts (downloaded from the Tromer database) using Geneious [46].

Enhancer-blocking assays
HFRs and H3-enriched genic control regions selected from the Hox clusters were amplified as test fragments from mouse genomic DNA using primers designed with Xho 1 restriction sites at their 5′ ends (Additional file 1: Table S5). The PCR products were then purified by gel extraction and cloned into pCR-Blunt II Topo vector