Regulation of SETD2 stability by its intrinsically disordered regions maintains the fidelity of H3K36me3 deposition

The SET domain-containing protein SETD2 is the sole methyltransferase in mammals that can trimethylate histone H3 at lysine 36. H3K36me3 is known to be involved in transcription elongation, pre-mRNA splicing, DNA methylation, and DNA damage repair. However, knowledge of the regulation of the SETD2 enzyme itself is limited. Here we show that the poorly characterized N-terminal region of SETD2 plays a determining role in regulating the stability of SETD2. This stretch of 1-1403 amino acid residues which contains disordered regions, is targeted for degradation by the proteasome. In addition, the SETD2 protein is aggregate-prone and forms insoluble inclusion bodies in nuclei especially upon proteasome inhibition. Removal of the N-terminal segment results in the stabilization of SETD2 and leads to a marked increase in global H3K36me3 which, uncharacteristically, can happen in an RNA Pol II-independent manner. The spurious H3K36me3 is deposited in a non-canonical distribution including reduced enrichment over gene bodies and exons. An increased SETD2 abundance leads to widespread changes in transcription and alternative splicing. Thus, the regulation of SETD2 levels through intrinsically disordered region-facilitated proteolysis is important to maintain the fidelity of transcription and splicing related processes.


INTRODUCTION
The N-terminal tails of histones protrude from the nucleosome and are hotspots for the occurrence of a variety of post-translational modifications (PTMs) that play key roles in regulating epigenetic processes.
H3K36me3 is one such important functionally characterized PTM. In yeast, this mark suppresses cryptic transcription from within the coding region of genes by preventing histone exchange (Venkatesh et al, 2012). In mammalian cells, it is involved in the recruitment of DNA repair machinery, in splicing and also, in establishing DNA methylation patterns by acting as a binding site for the enzyme DNMT3a (Qin et al, 2017) (Kolasinska-zwierz et al, 2009) (Dhayalan et al, 2010) (Li et al, 2013) (Pfister et al, 2014). H3K36me3 marks actively transcribed genes and is important in preventing the spread of heterochromatin boundaries by antagonizing PRC2 activity and the H3K27me3 repressive mark (Yuan et al, 2011). Recent reports have emphasized the tumor-suppressive role of H3K36me3 in renal cancers especially, where the gene coding for the SETD2 histone methyltransferase is often deleted or mutated  (Su et al, 2017) (Li et al, 2016).
In yeast, the SET domain-containing protein Set2 (ySet2) is the sole H3K36 methyltransferase (Strahl et al, 2002). ySet2 interacts with the large subunit of the RNA polymerase II, Rpb1, through its SRI domain, and co-transcriptionally deposits H3K36me3 (Xiao et al, 2003). The deletion of the SRI domain from ySet2 abolishes both the Set2-RNA Pol II interaction and H3K36me3 methylation in yeast (Suzuki et al, 2016). H3K36 methylation is a highly conserved histone mark and Set2 homologs are found in more complex eukaryotes (McDaniel & Strahl, 2017). These homologs share the conserved features like the AWS (associated with SET), SET [Su(var)3-9, Enhancer-of-zeste and Trithorax] and Post-SET domains that are required for the catalytic activity of the enzyme, and also, the protein-protein interaction regions such as the WW and SRI (Set2-Rpb1 Interaction) domains. However, there are differences in both the manner of H3K36me3 deposition in mammals as well as in the enzyme itself as compared to yeast. For instance, although SETD2 is the sole methyltransferase in mammals that can deposit the histone H3K36me3 mark, there are additional enzymes such as NSD1, NSD2, and ASH1L that can deposit H3K36me1 and me2 (Lucio-Eterovic et al, 2010) (Kuo et al, 2011) . Notably, SETD2 has a long N-terminal segment that is not present in ySet2. The function of this region has remained obscure (McDaniel & Strahl, 2017) .
Here we show that SETD2 is an inherently aggregate-prone protein and its N-terminal region regulates its half-life. This, in turn, is important for the fidelity of H3K36me3 deposition and the processes of transcription and splicing. Although the regulation of SETD2 stability by the proteasome has been reported earlier, our findings reveal that SETD2 is regulated by its intrinsically disordered regions and highlights the importance of such sequences in governing appropriate protein function and activity.

Deletion of the N-terminal region of SETD2 leads to changes in transcription and splicing
SETD2 is a large protein with 2564 amino acids. The C-terminal region of the protein (1404-2564 residues) has characterized domains that are conserved with ySet2. However, SETD2 has an N-terminal segment (1-1403 residues) not found in ySet2, the function of which is not clear [ Figure 1a]. Notably, the NCBI Conserved Domain Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) revealed that the SETD2 N-terminus lacks any known domains or motifs and did not yield any clue about its possible function. We hypothesized that if this region is functionally important, then the rescue of setd2Δ cells with a truncated form of SETD2, that lacks the N-terminal region (1-1403 amino acids), will have dissimilar outcomes on the transcriptome as compared to rescue with full-length SETD2. To test this hypothesis, constructs were made to express full-length (SETD2 FL) or the C-terminal segment of SETD2 (1404-2564, SETD2 N3) under the control of a CMV promoter. Recombinant expression of SETD2 has been previously used to investigate the function of the protein (Zhang et al, 2020) )(Zhu et al, 2016) (Carvalho et al, 2014) (Chen et al, 2017a). The constructs were introduced in setd2Δ 293T (KO) cells in which exon 3 of both the alleles of the endogenous SETD2 gene were disrupted using TALEN . Subsequently, high-throughput RNA-seq was performed.
Expression of SETD2 FL and SETD2 N3 led to transcriptome changes in the KO cells [supplementary information S1]. Strikingly, a comparison of SETD2 N3 versus SETD2 FL-expressing cells revealed that a total of 1234 genes exhibited significant differential expression [Fold change (FC)  Ontology Enrichment Analysis revealed that the upregulated genes are not enriched in any pathway. This suggests that the removal of the N-terminal region of SETD2 leads to widespread changes in transcription without affecting any specific pathway.
Analysis of our transcriptome data to look for possible differences in splicing also revealed widespread changes in cells expressing SETD2 N3 versus SETD2 FL. A total of 2726 differential alternative splicing events (FDR 5 Collectively, our data suggest that removal of the N-terminal region of SETD2 leads to widespread changes in the transcriptome of mammalian cells and indicates a functional role of this region.

Removal of the N-terminal region stabilizes the SETD2 protein and leads to a marked increase in global H3K36me3 levels
To gain insights into how the N-terminal region might affect the function of SETD2, we first wanted to test whether removal of this stretch affects the catalytic activity of SETD2. To check the activity of the exogenously introduced SETD2 constructs, KO 293T cells were used. Consistent with the role of SETD2 as the sole H3K36me3 depositor in humans, in the KO cells the H3K36me3 mark was not detected in the whole-cell lysates by immunoblotting [supplementary information S2a]. Next, constructs to express Halotagged SETD2 FL or SETD2 N3 were introduced in KO cells by transfection. 72 hours post-transfection, whole-cell lysates were prepared and analyzed by western blotting. The expression of the empty vector (VC) did not rescue H3K36me3 as expected [ Figure 1d]. Strikingly, the expression of SETD2 N3 in KO cells led to a marked increase in the H3K36me3 level as compared to the rescue with SETD2 FL [ Figure   1d]. The other two H3K36 methyl marks, H3K36me1 and H3K36me2, largely remained unchanged [ Figure 1d]. Some methyltransferases are known to harbor autoinhibitory domains  (Hou et al, 2019) (Qiao et al, 2011). It is unknown whether SETD2 contains such a domain, and if its absence in SETD2 N3 affects the H3K36me3 levels. To discern between this possibility and whether the difference in H3K36me3 is simply due to altered abundance of the truncated (N3) vs the full-length (FL) protein, western blotting of the whole-cell lysates was performed with an anti-Halo antibody. Strikingly, although Halo-SETD2 FL could rescue H3K36me3 in KO cells, it could not be detected using an anti-Halo antibody, suggesting that the expression level was very low [ Figure 1e](discussed in the Discussion section). A robust signal was obtained for both Halo-Vector control (VC) and Halo-SETD2 N3 lanes demonstrating that the expression of the smaller SETD2 fragment was considerably higher than SETD2 FL [ Figure 1e]. RT-PCR using primers specific for SETD2 FL and SETD2 N3 transcripts confirmed that the transcripts were produced robustly [supplementary information S2b]. Also, the transcript levels of SETD2 FL and SETD2 N3 were comparable, which was also observed in our RNA-seq data, suggesting that the differences observed in protein abundance are not due to the differences in the transcription of the two constructs but rather, might be due to instability of the full-length SETD2 protein. Similar results were obtained with the GFP-tagged constructs of SETD2. H3K36me3 level was much higher on SETD2 N3 expression and the fluorescence intensity of GFP-SETD2 FL was considerably weaker than GFP-Vector control and GFP-SETD2 N3 [supplementary information S2c, d].
Thus, full-length SETD2 does not accumulate in human cells. The removal of the N-terminal segment leads to a marked increase in its concentration that is also manifested in the noticeable increase in the global H3K36me3 level. Also, these experiments demonstrate that in the absence of the N-terminal region, SETD2 retains its histone methyltransferase activity.

SETD2 is robustly degraded by the ubiquitin-proteasome pathway
Autophagy and the ubiquitin-proteasome system (UPS) are the major pathways for protein degradation in mammalian cells (Cooper, 2000). To investigate whether autophagy plays a role in SETD2 turn-over, 293T cells expressing GFP-SETD2 FL were treated with increasing concentration of the lysosome inhibitor chloroquine. Chloroquine treatment did not have an apparent effect on the GFP-SETD2 FL level To test whether SETD2 is targeted for degradation by the UPS, SETD2 FL expression was checked after treating the cells with the proteasome inhibitor MG132. Proteasome inhibition led to an increase in the accumulation of SETD2. This was confirmed by western blotting of Halo-SETD2 FL expressing cells as well as microscopy of GFP-SETD2 FL [supplementary information S3a, Figure 1f].
The addition of MG132 did not have a prominent effect on Halo or GFP expression [supplementary information S3b, c]. Also, RT-PCR did not reveal any change in the transcript abundance of both endogenous as well as exogenous SETD2 on MG132 treatment, suggesting that the increased SETD2 protein abundance observed is due to protein stabilization [ Figure 1g]. Importantly, an increase in the accumulation of endogenous SETD2 was also observed by western blotting upon MG132 treatment of WT 293T cells, implying that degradation by the proteasome is not limited to the recombinant protein [ Figure 1h].
To check whether the robust proteasomal degradation of SETD2 is unique to 293T cells, the expression of GFP-SETD2 FL was tested in HEPG2 and HELA cells. SETD2 behaved similarly in these cell lines. Very weak expression of GFP-SETD2 FL was observed which increased upon proteasome inhibition by MG132 treatment [supplementary information S3d]. Our results suggest that the short halflife of SETD2 due to UPS mediated decay is not cell line-specific and is a characteristic of the SETD2 protein.
Next, we turned our attention to the question of how the SETD2 protein is targeted for degradation. We checked the effect of the E3 ubiquitin ligase SPOP and anaphase-promoting complex (APC) on SETD2 degradation as these have been reported to promote degradation of the SETD2 protein (Zhu et al, 2016) (Dronamraju et al, 2018) [please see supplementary information S4 and the associated results for details]. Our results agreed with the previously published data that SPOP regulates SETD2 stability, but also indicated that the post-translational control of SETD2 abundance is more complicated than previously thought and may contain several redundancies. We also investigated the effect of PEST motifs on SETD2 degradation as it is known to lead to robust degradation (37). However, no apparent effect on SETD2 stability was observed on the deletion of a PEST motif [please see supplementary information S5 and the associated results for details].

SETD2 has long intrinsically disordered regions
Our experiments by mutating SPOP and APC binding motifs in SETD2 hinted that the control of SETD2 expression may be more complex than would be the case if it were solely mediated by a distinct domain(s) or motif(s). Besides the fact that the SETD2 N-terminus lacks any known domains or motifs, this segment has very little sequence similarity with other known protein sequences. Consequently, homology modeling of the structure using SWISS-MODEL (https://swissmodel.expasy.org/) failed.
Strikingly, the sequence is also 81% disordered based on Robetta prediction (http://robetta.bakerlab.org/). This was of interest because intrinsically disordered regions (IDRs) are known to regulate protein half-life (Tompa et al, 2008)(van der Lee et al, 2014. A study with 3,273 proteins in yeast, 4,502 in mouse, and 3,971 in humans revealed strong correlations between IDRs and protein half-life. The study revealed that long N-terminal and internal disordered segments contribute to short protein half-life in vivo (van der Lee et al, 2014). This prompted us to perform a deeper analysis of the disordered regions in SETD2 protein.
Prediction of the disordered region in SETD2 was performed using IUPRED2 (https://iupred2a.elte.hu/plot) (Mészáros et al, 2018). IUPred provides a score between 0 and 1 for every residue, corresponding to the likelihood of the given residue being a part of a disordered region. Hence, a score of greater than 0.5 means that it is more likely that the residue is a part of a disordered region.
Strikingly, 1546 of the 2564 (60.29%) residues of SETD2 protein returned a score of >0.5 as compared to a Halo control (0%) [supplementary information S6]. Furthermore, 67.6% of residues in the N-terminus scored >0.5 as compared to 51.4% of residues of C-terminus.
Next, we looked for long disordered regions (>30 residues) in SETD2 in accordance with a previously published report (van der Lee et al, 2014). The length was based on findings that a critical minimum length of ~30 residues allows a disordered terminus of a ubiquitinated substrate to efficiently initiate proteasomal degradation (Da Fonseca et al, 2012) (Lasker et al, 2012b). Strikingly, 17 such stretches were found in SETD2 out of which 11 were in the N-terminal region, 5 in C-terminal fragment and 1 overlapping between and the N and C-terminal segments [supplementary information S6].
Our findings regarding stability and disordered region prediction in SETD2 suggest the possibility that the disordered regions of SETD2 may govern its half-life.

Multiple disordered segments of SETD2 have a combined effect on its half-life
Interestingly, proteins with several internal disordered segments have shorter half-lives than proteins with only one such segment (van der Lee et al, 2014). SETD2 undergoes robust degradation and is predicted to have numerous disordered segments throughout. We speculated that if the multiple disordered segments of SETD2 collectively enhance its proteolysis, then the expression of the shorter fragments will be higher than the longer ones. To test this, a series of constructs were made to express Halo-tagged Nor C-terminal truncations of SETD2 in 293T cells [ Figure 2a, b, c, d]. The use of truncation mutants to determine the region of protease sensitivity in a protein is a commonly used approach and has been successfully used (Sano et al, 2007)(Zhu et al, 2016. Western blotting of whole-cell extracts with an anti-Halo revealed that for both the N and C-terminal truncations, SETD2 expression anti-correlated with the length of the fragment [ Figure 2c, d]. Also consistent with our hypothesis, three non-overlapping fragments of Halo-SETD2; C4, 504-1403 and N3 expressed robustly in 293T cells whereas SETD2 FL could not be detected by anti-Halo western blotting [ Figure 2e, f]. Notably, microscopy analyses revealed that fragment 504-1403 was cytoplasmic, unlike C4 and N3 which were nuclear [supplementary information S7a]. To test whether the localization of this fragment alters its stability, the C-myc nuclear localization signal (NLS) was added to GFP-504-1403 (504-1403') and the expression was checked.
Microscopy revealed that the addition of NLS resulted in the nuclear translocation of 504-1403 and also, reduced its expression level [supplementary information S7b]. This was also confirmed by western blotting of the whole-cell lysates [supplementary information S7c]. Nevertheless, the expression of all the SETD2 fragments was very robust compared to SETD2 FL and continued to display sensitivity to MG132.
To confirm that the differences observed in the expression are indeed due to the shorter half-life of larger SETD2 fragments, a time-chase experiment was performed to monitor the expression of Halo-N1 and N4 on the treatment of cells with cycloheximide, a translation inhibitor. Post 2 hours cycloheximide treatment, Halo-N1 could no longer be detected by western blotting whereas no appreciable decrease was observed for Halo-N4 even after 4 hours [ Figure 2g, h]. Cycloheximide chase experiment for Halo-N4 was performed to include later time points. Halo-N4 could be clearly detected even after 48 hours of treatment [ Figure 2i] demonstrating that the shorter fragment N4 has a longer halflife than N1. Importantly, the observed differences in the half-life of the proteins along with our RT-PCR and RNA-seq data also validate that the dissimilarities in the expression of the SETD2 constructs are indeed due to the differences in protein stability and is not due to experimental variations like transfection efficiency.
From these experiments, no specific region emerged in the SETD2 protein that is particularly targeted for UPS-mediated decay and confirmed that the turnover of SETD2 is regulated by factors besides those that were previously reported. Furthermore, consistent with the report that IDRs have a combined effect on protein half-life, multiple regions of SETD2 were found to co-operatively regulate its proteolysis.

IDR-rich N-terminus can reduce the half-life of ySet2
In yeast, ySet2 is a well-characterized protein that is degraded by the proteasome (Fuchs et al, 2012).
Our data revealed that ySet2 is expressed much more than SETD2 in 293T cells [supplementary information S8b]. We speculated that the disparity in expression between ySet2 and SETD2 could be due to differences in disordered region abundance between the proteins. Disordered region prediction revealed that overall ySet2 is a well-ordered protein with a much lower proportion of residues (24.69%) predicted to be disordered as compared to its homolog SETD2 [supplementary information S6d]. The gain or loss of disordered segments may be an important contributor to the degradation rate of proteins during evolution (van der Lee et al, 2014). It is possible that similar evolutionary factors favored the development of the IDR rich SETD2 to promote its degradation. We wanted to test whether an N-terminal segment of SETD2 can destabilize the ySet2 protein when fused to its N-terminus [ Figure 3a].
First, the expression of GFP-or Halo-SETD2 FL/N3/ySet2 was compared in 293T cells by microscopy and western blotting of whole-cell extracts with an anti-Halo antibody. The results showed that the expression of ySet2 was significantly higher than SETD2 FL and is comparable to SETD2 N3  (Lasker et al, 2012a). Consistent with that, our data shows that ySet2 responds to MG132 treatment in 293T cells, suggesting, that it is targeted for degradation through UPS in human cells too [supplementary information S8d and Figure 3c, e]. It is speculated that the observed relationships between IDRs and protein-half life may be evolutionarily conserved (van der Lee et al, 2014). Hence, we wondered whether the IDR-rich SETD2 N-terminal segment can enhance ySet2 proteolysis in yeast. To test this, the expression of FLAG-tagged WT and chimeric ySet2 proteins was checked in the yeast strain BY4741. The expression was scored by probing the whole-cell extracts with an anti-FLAG antibody.
Although the addition of SETD2 fragment γ (967-1403) did not have a large impact, the addition of β and α + β had a drastic destabilization effect on ySet2 protein [ Figure 3f].
Collectively, our results demonstrate that the IDR rich N-terminus of SETD2 protein has a destabilization effect on both SETD2 and its yeast homolog ySet2. Importantly, they also suggest evolutionary conservation of the role of the N-terminal region of SETD2 in bringing about the degradation of a protein that is key evidence to support IDR-mediated degradation

SETD2 forms puncta that are characteristic of intranuclear aggregates
Microscopy revealed that GFP-C4 is nuclear and formed puncta [supplementary information S7a]. This was surprising because this segment doesn't have an NLS with a significant score as per NLS mapper prediction (http://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi) (Kosugi et al, 2009). We decided to characterize the NLS of SETD2 to better understand the unexpected localization of the SETD2 fragments. To this end, the localization of a series of GFP-SETD2 fragments was checked using fluorescence microscopy (data not shown). Of all the SETD2 fragments tested, the data revealed the presence of putative NLS in three fragments of SETD2: 967-1690967- , 1964967- -2263967- and 2423967- -2564].
This was consistent with NLS mapper prediction that revealed the presence of NLS in each of these segments [supplementary information S9a]. To validate these NLS, site-directed mutagenesis was 1 2 performed to mutate the K and R residues to A and the disruption of the nuclear localization of the mutated GFP-SETD2 fragments was confirmed by microscopy [ Figure 4a, supplementary information 9b].
Further, to validate that full-length SETD2 has only these three NLS, site-directed mutagenesis was performed to mutate these NLS one by one [ Figure 4a]. The cytoplasmic localization of the GFP-SETD2 FL mutant, in which all the three NLS were disrupted, confirmed that SETD2 has three NLS [ Figure 4a].
Importantly, this also shows that the SETD2 fragment C4 forms nuclear puncta without an NLS.
Also, treatment with MG132 caused nuclear translocation and formation of puncta by SETD2 504-1403 [supplementary information S7b] suggesting that SETD2 might be an aggregate-prone protein. To test whether the full-length SETD2 protein also behaves similarly, the localization of NLS mutants of SETD2 was tested on MG132 treatment. Strikingly, all the mutants exhibited the formation of nuclear puncta [ Figure 4a]. Interestingly, very similar to our observations with SETD2, truncated N-terminal fragments of huntingtin with expanded glutamine repeats are known to form nuclear aggregates in cell culture even without an NLS (Cooper et al, 1998).
To characterize the aggregate-prone tendency of SETD2 further, 293T cells expressing GFP-SETD2 truncations (described in Figure 2) were observed under the microscope with or without MG132 treatment. The microscopic approach corroborated the expression level data obtained from immunoblotting; confirming that smaller SETD2 fragments express more robustly than larger ones (data not shown). Remarkably, treatment with MG132 led to a marked increase in the formation of puncta by all the SETD2 fragments [ Figure 4b]. Notably, the completely cytoplasmic fragment C3 showed a pancellular distribution with a tendency to form puncta upon MG132 treatment similar to the segment 504-1403 and SETD2 FL NLS Mutant 3. Furthermore, puncta were also formed by WT GFP-SETD2 FL protein suggesting that it might not be an artifact caused due to protein misfolding resulting from the truncations or mutations [ Figure 4b]. To test whether endogenous SETD2 behaves similarly, immunofluorescence of 293T cells was performed with an anti-SETD2 antibody. Importantly, proteasomal inhibition caused speckle-like staining, revealing that endogenous SETD2 also forms puncta [ Figure 4b].
This observation, together with the fact that despite very weak expression, SETD2 FL is aggregate prone, 1 3 suggests that aggregation is an intrinsic property of the SETD2 protein that is exacerbated by increased protein abundance.

SETD2 forms ubiquitinated insoluble aggregates
Exogenously expressed as well as endogenous SETD2 formed puncta, especially on MG132 treatment, that are reminiscent of inclusion bodies comprising of aggregated proteins. The most striking pattern was observed with fragment C4 that formed distinct puncta even in the absence of proteasome inhibition. As aggregates are often polyubiquitinated, to confirm that the puncta formed by SETD2 are aggregated structures, the colocalization of RFP-ubiquitin with GFP-C4 was tested. Clear colocalization was observed suggesting that C4 puncta are indeed ubiquitinated aggregate structures [ Figure 5a]. GFP-C4 did not colocalize with RFP-Fibrillarin indicating that the puncta are not nucleolar [ Figure 5a]. To biochemically substantiate that SETD2 is ubiquitinated, Halo-N3 was affinity-purified from HEK293T cells co-expressing HA-ubiquitin with or without MG132 treatment. The purified proteins were then resolved on a gel and analyzed for the presence of ubiquitination. Western blotting with an anti-HA antibody revealed that SETD2 N3 was indeed ubiquitinated [ Figure 5b].
To confirm further that SETD2 forms aggregates, we checked the solubility of Halo-SETD2 fragments C4, 504-1403 and N3. 293T cells expressing Halo-SETD2 fragments were lysed, their soluble and insoluble fractions separated and afterward analyzed by western blotting with an anti-Halo antibody.
Correlating with the microscopy observations, C4 that formed spontaneous puncta was highly insoluble, as to a lesser extent was N3 [ Figure 5c]. The segment 504-1403 was soluble [ Figure 5c]. Interestingly, the data shows that the different regions of the same protein have different aggregation propensity under similar experimental conditions and are not merely a consequence of expression from a strong CMV promoter.
Collectively, our data show that SETD2 forms aggregated ubiquitinated puncta (discussed in the Discussion section).

At high cellular levels, SETD2 has a reduced RNA pol II dependency for H3K36me3 deposition
We found that the removal of the N-terminal region of SETD2 leads to the stabilization of the remaining portion that shares conserved domains with ySet2. This leads to a marked increase in global H3K36me3 1 4 levels as scored by western blotting [ Figure 1d, supplementary information 2c]. Studies conducted in yeast have revealed that the deposition of the H3K36me3 mark is strictly dependent on the ySet2-Pol II association (Suzuki et al, 2016). We were curious whether the marked increase in the global H3K36me3 level that occurs on SETD2 N3 expression happens in an RNA Pol II-dependent manner.
To test this, Halo-SETD2 constructs without the SRI domain were introduced in setd2Δ 293T cells. Similar to the findings for ySet2 in yeast, removal of the SRI domain from full-length SETD2 protein (FLΔSRI) led to a marked decrease in H3K36me3 levels as compared to the FL [ Figure 6a, supplementary information 10a]. Strikingly though, removal of the SRI domain from SETD2 N3 (N3ΔSRI) had a very marginal effect on the H3K36me3 levels [ Figure 6a]. To confirm that the removal of the SRI domain leads to the abolishment of SETD2-Pol II interaction, Halo-FLAG-SETD2 N3 and Halo-FLAG-SETD2 N3ΔSRI were affinity purified from 293T extracts using Halo ligand-conjugated magnetic resin.
Elution of proteins purified using this technique involves cleaving off the Halo tag with TEV protease, leaving the FLAG epitope which can be detected from the eluted bait by immunoblotting [ Figure 6b].
Immunoblotting with an anti-Pol II antibody confirmed that the deletion of the SRI domain from SETD2 leads to the abolishment of SETD2-Pol II interaction [ Figure 6b].
We wondered whether the decreased dependency on Pol II interaction for the H3K36me3 activity of SETD2 N3 is due to the loss of a possible autoinhibition by the N-terminal region of SETD2 or is due to the increased expression of SETD2 N3 fragment as compared to the full-length protein. To address these possibilities, Halo-SETD2 N3 constructs under the control of the CMVD2 promoter were introduced in setd2Δ cells. CMVD2 promoter is a truncated form of CMV and exhibits a much weaker activity. The weaker activity of the CMVD2 promoter was confirmed by RT-PCR [supplementary information S10b].
The reduced expression of SETD2 N3 under the regulation of CMVD2 promoter was verified by analyzing whole-cell extracts with an anti-Halo antibody [ Figure 6a]. Notably, analysis of H3K36me3 revealed that the RNA Pol II dependency of SETD2 N3 was restored at reduced expression level as SETD2 N3ΔSRI did not exhibit much activity when expressed using the CMVD2 promoter [ Figure 6a, supplementary information S10a].

5
We conclude that at high cellular levels, SETD2 has a reduced RNA pol II dependency for H3K36me3 deposition.

High levels of SETD2 leads to non-canonical H3K36me3 distribution
We wondered whether the increased SETD2 accumulation also leads to an altered distribution of the H3K36me3 mark. For this, first, spike-in normalized H3K36me3 ChIP-Seq of WT and KO cells was performed. Metagene analysis revealed clear enrichment of H3K36me3 within the coding region of the genes in the WT cells [supplementary information S10c]. As expected, a similar pattern was not observed in KO cells that lack SETD2 and hence, H3K36me3 [supplementary information S10c]. Additionally, metagene analysis of H3 normalized H3K36me3 of protein-coding genes highlighted that H3K36me3 is greatly enriched on highly expressed genes as compared to the lowly expressed ones, consistent with the idea that the mark is associated with transcriptionally active genes [supplementary information S10d].
Furthermore, a closer inspection of the H3 normalized H3K36me3 distribution within the coding region revealed that H3K36me3 is more enriched over exons than introns [supplementary information S10e].
Next, ChIP-Seq of H3K36me3 was performed post introduction of GFP-SETD2 constructs in KO cells and their distribution was compared. On rescue of KO cells with SETD2 constructs, the H3K36me3 mark continued to be enriched within the gene bodies. Notably, this was true even for SETD2 N3ΔSRI and is consistent with the reports that association with RNA Pol II is not required for Set2 association with highly transcribed genes (discussed in the Discussion section). Also, the H3K36me3 level correlated with gene expression [supplementary information S10f]. However, a closer inspection of the distribution reveals that it is skewed towards the 5' end of the genes in cells rescued with N3 as compared to FL [ Figure 6c]. Furthermore, analysis of global H3K36me3 peaks revealed that there is more deposition of H3K36me3 at intergenic regions post-rescue with SETD2 N3 as compared to FL [ Figure 6d]. In addition, there was an aberrant deposition of H3K36me3 on genes such as CLIC4 although no change in its expression was observed in our RNA-seq data [ Figure 6e].
The ratio of average exon signal divided by the average intron signal was lower in N3 rescued cells than FL rescued cells [ Figure 6f]. This suggests that on an increased accumulation of SETD2, the enrichment of H3K36me3 over exons decreases. The 5' skewing and loss of enrichment over exons are 1 6 exemplified in genes such as ARID1A [ Figure 6g]. Furthermore, analysis of the genes showing differential alternative splicing events in N3 as compared to FL display opposite trends of exon enrichment [supplementary information S10g]. This kind of opposite trend was not observed in FL expressing cells in which the exon enrichment values were ~1.5 [supplementary information S10g]. Also, the H3K36me3 levels were higher on the genes that showed increased splicing in N3 versus FL [supplementary information S10h].
Thus, in the absence of N-terminal rich IDRs, there is reduced initiation of UPS-mediated decay of SETD2. This results in its high abundance, reduced Pol II dependency and leads to global changes in the canonical distribution pattern of H3K36me3 [ Figure 7].

DISCUSSION
In recent years many reports have highlighted the adverse effect of the loss of H3K36me3 that occurs due to SETD2 deletion. Here we show that the other end of the spectrum can also be detrimental as an excess of SETD2 might have inadvertent outcomes. In such a scenario, its IDRs play an important role in tuning and maintaining the requisite intracellular amount of the protein.

IDRs are common in proteins important in transcription pathways
Regulating protein half-life is critical for the wellbeing of cells. Altered protein half-life can lead to abnormal development and diseases such as cancer and neurodegeneration (Ciechanover, 2007).
Considering the role of H3K36me3 in a variety of important cellular processes, it is reasonable that regulating the activity of the methyltransferase responsible for the deposition of this mark is important.
Although the role of the proteasome in regulating SETD2 stability has been reported earlier, how SETD2 degradation occurs so robustly was not clear. Furthermore, it was not clear why the cellular half-life of SETD2 is so tightly regulated. We not only show that multiple regions of SETD2 are targeted by the proteasome, but also found that the disordered segments of SETD2 are important in this process. Importantly, our work clearly illustrates the importance of the N-terminal segment of SETD2 in governing its function, which has been a mystery. Strikingly, an analysis of the functional annotations of proteins with long disordered segments revealed enrichment for associations with regulatory and transcription functions (van der Lee et al, 2014). This suggests that the IDR-mediated regulation of protein half-life that 1 7 we discovered for SETD2 might be a prevalent mechanism employed by cells to govern essential processes.

SETD2 IDRs possibly act as efficient sites for initiating proteasomal degradation
Disordered segments act directly to regulate protein half-lives by forming initiation sites for degradation by the proteasome (van der Lee et al, 2014) (Zhao et al, 2010) (Prakash et al, 2004) (Verhoef et al, 2009).
Consistent with this, we did not find any distinct segment of SETD2 that is degradation prone. We also did not see any apparent effect of the presence of D and KEN box motifs or PEST sequences on the half-life of SETD2. The catalytic residues are situated deep within the proteasome core particle and are only accessible through a long narrow channel. A terminal disordered segment of 30 residues or an internal disordered segment of at least 40 residues can span twice this distance and thus, could be cleaved by the core particle (van der Lee et al, 2014). Thus, proteins such as SETD2 that have 17 long disordered segments are expected to be processed quickly due to the efficient initiation of degradation. In addition to that, the disordered segments are spread across the length of the protein and co-operatively leads to the robust degradation of SETD2.

Variations in the N-terminal region might tune SETD2 half-life
As IDRs have a role in protein stability, variation in IDRs might be a mechanism for the divergence of halflife among orthologous proteins. IDRs are not required to attain a specified three-dimensional conformation in order to exert their functional effect and hence, can undergo mutations without greatly affecting their functionality. Such variation in disordered segments may provide an evolutionary mechanism for fine-tuning protein turnover rates. These forces might lead to inter-as well as intra-species divergence of protein half-life. Analysis of Drosophila SETD2 revealed large and abundant disordered regions [supplementary information S6e]. The differences in the degree of disorder between Drosophila and human SETD2 proteins might lead to differences in their half-lives much like what we observed between SETD2 and ySet2. Such differences in half-lives between homologs might be needed to adjust for the differences in the mechanism of deposition of H3K36 methylation. For instance, unlike in yeast where ySet2 performs all three states of methylation of H3K36, SETD2 does not appear to be majorly responsible for me1 and me2 deposition. Therefore, even with a shorter half-life, SETD2 might be able to 1 8 do the required H3K36me3 deposition. Some evidence for this is provided by studies on human cancers that show that total H3K36me3 levels are not significantly impacted by a monoallelic loss of SETD2  (Roberti et al, 2016).
In addition to inter-species differences, intra-species variation in the length of IDRs may arise through mechanisms such as repeat expansion, alternative splicing, and alternative transcription start sites. Interestingly, in the ENSEMBL database, SETD2 has 9 splice variants out of which three are protein-coding. One of those with transcript ID ENST00000638947.1 codes for only a 591 residue segment of SETD2 that, importantly, contains the catalytic SET domain. This segment is trimmed off the IDRs present in full-length SETD2 and might be expected to accumulate to a higher level with inadvertent consequences as discussed below. Furthermore, differences in protein turnover may disturb protein abundance and could lead to disease (Ciechanover, 2007) (Yang et al, 2012). Strikingly, such missense and truncation mutations that can potentially alter half-life, are found in SETD2 in cancers [supplementary information S11].

SETD2 is aggregate-prone
We found that SETD2 is an inherently aggregate-prone protein. Although aggregation and intrinsic disorder are considered independent properties, a positive correlation has been observed between the two (Zhang et al, 2019) (Carvalho et al, 2013) (Uemura et al, 2018) (Kirilyuk et al, 2012). Overaccumulation of SETD2 presents cells with challenges because of the aggregate-prone nature of the protein. The insolubility of proteins leads to their sequestration in inclusion bodies and inactivation. Besides its own inactivation, aggregation of SETD2 might lead to the co-aggregation of other proteins leading to their inactivation and proteotoxic stress. Ubiquitinated aggregates like the ones formed by SETD2 can directly inhibit or clog proteasomes (Hipp et al, 2012) (Nonaka & Hasegawa, 2009). As aggregation is a concentration-dependent phenomenon, possibly the robust degradation mechanism in cells ensures that SETD2 is kept at levels that maintain its solubility and activity. This mechanism is often employed by cells for IDR-containing proteins (Babu, 2016). Interestingly, SETD2, aka, HYPB was initially identified in a screen to find interactors of the aggregate prone protein Huntingtin (Faber et al, 1998). It is possible that the reported Huntingtin-SETD2 interaction was due to the aggregation propensities of these proteins. In 1 9 fact, the N-terminus of SETD2 (SETD2 C4) behaves very similarly to the polyQ containing N-terminus of Huntingtin protein. Like mutant Htt, SETD2 C4 localizes to the nucleus in absence of an NLS and forms spontaneous puncta characteristic of aggregated proteins. The nucleoplasm promotes the formation of such aberrant and insoluble protein aggregates due to the strong crowding forces from highly concentrated macromolecules (approximately 100 mg/ml) (Mikecz, 2009). It will be interesting to investigate in the future whether the aggregate prone tendency of SETD2 enables it to form a part of RNA-granules and regulate transcription. H3K36me3 deposition [supplementary information S11]. We focused on the changes in the H3K36me3 2 0 mark upon SETD2 overexpression. However, SETD2 also has non-histone targets like tubulin, the methylation of which is important for metaphase transition . Hence, the consequences of an increase in SETD2 abundance might not be limited to changes in histone methylation.

Pol II association is required for enhancing SETD2 activity but not for its recruitment
Our data shows that when the expression level of SETD2 is high, it has a reduced dependency on RNA Pol II for H3K36me3 deposition. In fact, from the ChiP-Seq data of H3K36me3, it appears that RNA Pol II association is not required for chromatin recruitment of SETD2 as previously assumed, but rather this interaction is required for the activation of SETD2 enzymatic activity. When present in its normal cellular amounts, SETD2 is not active enough to lead to robust H3K36me3 deposition without the RNA Pol II association. Pol II association enhances SETD2 activity and hence, although SETD2 FL protein is barely detectable, the rescue of H3K36me3 can be readily seen in the setd2Δ cells. At high abundance, despite its low activity, a robust H3K36me3 deposition is seen likely due to the increased copy number of SETD2 protein in cells. Recent studies in yeast have also challenged the notion that the Pol II association is required for chromatin recruitment of Set2 . The study found that the engagement of Set2 and Pol II through the SRI domain is rather required for the activation of Set2. In fact, the Set2ΔSRI ChiP profile shows an enrichment on the coding sequence of genes that is skewed towards the 5' end much like what we found for H3K36me3 when SETD2 N3ΔSRI is introduced in setd2Δ cells (Suzuki et al, 2016). Possibly, Set2/SETD2 can continue to engage with the transcription elongation complex even in the absence of its interaction with RNA Pol II. In future studies, it will be interesting to determine how the gene body enrichment of H3K36me3 and Set2/SETD2 occurs even without the Pol II association.

MATERIALS AND METHODS
Plasmids-SETD2-HaloTag® human ORF in pFN21A was procured from Promega. Deletion mutants of SETD2 were constructed by PCR (Phusion polymerase, NEB) using full-length SETD2 as a template and individual fragments were cloned. All constructs generated were confirmed by sequencing. SETD2-GFP, mRFP-Ub, pLenti puro HA-Ubiquitin, pTagRFP-C1-Fibrillarin and pCDNA3-ySet2 were procured from Addgene. 1 Cell line maintenance and drug treatment-The cell lines used in this study (HEK293T, HEPG2 and HELA) were procured from ATCC. Cells were maintained in DMEM supplemented with 10% FBS and 2 mM L-glutamine at 37 °C with 5% CO 2 . MG132 (Sigma) was added at a final concentration of 10 μ M for 12 hours. Chloroquine (Sigma) treatment was done as indicated in the text. Cycloheximide (Sigma) was added at a final concentration of 10 μ M. Transfections were performed at cell confluency of 40% using Fugene HD (Promega) using a ratio of 1:4 of the plasmid (µg) to transfection reagent (µl). Histone isolation and immunoblot analysis-Histones were isolated and analyzed as described previously (Bhattacharya et al, 2017). For immunoblotting, histones were resolved on 15% SDS-polyacrylamide gel, transferred to PVDF membrane and probed with antibodies. Signals were detected by using the ECL plus detection kit (ThermoFisher). Cell Fractionation-To prepare soluble and insoluble extracts, 293T cells were washed with 1xPBS, collected by centrifugation, and resuspended in lysis buffer (50 mM Tris, pH 7.5, 350 mM NaCl, 1%

Antibodies-
Triton-X 100, 0.1% Na-deoxycholate and a protease inhibitor mix). The lysed cells were centrifuged at 13,000 rpm for 20 min. The supernatant was collected as the soluble fraction. The pellet was washed with lysis buffer containing 600 mM NaCl). The remaining insoluble pellet following another centrifugation was resuspended in Laemmli buffer (Biorad) and solubilized by sonication on ice.

2
Affinity purification-293T cells expressing the protein of interest were harvested in 1xPBS and collected by centrifugation. The cells were lysed by resuspending in lysis buffer (50 mM Tris, pH 7.5, 150 mM NaCl, 1% Triton-X 100, 0.1% Na-deoxycholate and a protease inhibitor cocktail). The lysed cells were centrifuged at 13,000 rpm for 20 min. The supernatant was collected and diluted 1:3 by adding dilution buffer (1x PBS, pH 7.5 with 1mM DTT and 0.005% NP-40). The diluted lysate was added to 100 µl of preequilibrated Magne® HaloTag® Beads (Promega, G7282) and incubated overnight on a rotator at 4 °C.
Immunofluorescence-293T cells were plated onto glass coverslips in a 6-well plate. Cells were washed with 1x PBS and fixed in 4% paraformaldehyde for 20 min at 37 °C. Cells were then washed three times with cold 1x PBS and permeabilized for 5 mins with 1x PBS containing 0.2% Triton X-100. Permeabilized cells were then blocked for 30 min with blocking buffer (3% BSA and 0.1% Triton-X in 1x PBS). Cells were stained with primary antibodies against SETD2 (1:1,000; Abclonal) for 1 hr at room temperature. A secondary antibody conjugated with AlexaFluor 568 was applied for 1 hr at room temperature.
ChIP-Cells were cross-linked by 1% formaldehyde for 10 mins, and then quenched in 125 mM glycine for 5 mins. After washing with cold 1x PBS thrice, cells were harvested by scraping and pelleted down by centrifugation. The cell pellet was resuspended in swelling buffer (25 mM HEPES pH 8, 1.5 mM MgCl 2 , 10 mM KCl, 0.1% NP40, 1 mM DTT, protease inhibitor cocktail), kept in ice for 10 mins and then dounced. The nuclear pellet was obtained by centrifugation and resuspended in sonication buffer (50 mM HEPES pH 8, 140 mM NaCl, 1 mM EDTA, 1% Triton X 100, 0.1% Na-deoxycholate, 0.1% SDS, protease inhibitor cocktail), followed by sonication on ice for 12 cycles (30% amplitude, 10 secs on / 60 secs off) using a Branson Sonicator. For spike-in normalization, the spike-in chromatin and antibody were added in the reaction as per the manufacturer's recommendation (Active Motif). The chromatin was incubated with antibodies at 4 °C overnight and then added to 30 μ l of protein G-Dyna beads (Thermo Fisher Scientific) for an additional 2 hours with constant rotation. The beads were extensively washed, and bound DNA was eluted with elution buffer (50 mM Tris-HCl pH 8, 5 mM EDTA, 50 mM NaCl, 1% SDS) and reverse-2 3 crosslinked at 65 °C overnight. DNAs were purified using the QIAquick PCR purification kit (Qiagen) after the treatment of proteinase K and RNase A.
High throughput sequencing-Sequencing libraries were prepared using High Throughput Library Prep Kit (KAPA Biosystems) following the manufacturer's instructions. The library was sequenced on an Illumina HiSeq platform with paired reads of 75 bp for RNA-seq and single reads of 50 bp for ChIP-seq.
ChIP-seq analysis-Raw reads were demultiplexed into FASTQ format allowing up to one mismatch using Illumina bcl2fastq2 v2.18. Reads were aligned to the human genome (hg38) using Bowtie2 (version 2.3.4.1) with default parameters (Langmead & Salzberg, 2012). For samples with fly spike-in, reads were first mapped to the Drosophila melanogaster genome (dm6), and unmapped reads were then aligned to the human genome (hg38). Reads per million (RPM) normalized bigWig tracks were generated by extending reads to 150bp. For spike-in ChIP-seq data, we also generated spike-in normalized bigWig tracks (RPM normalization factor = 1E6 / number of reads aligned to hg38, and spike-in normalization factor = 1E6 / number of reads aligned to dm6).
Metagene Plots-14533 Protein-coding genes (Ensembl 96 release) were selected with length ≥ 600bp and no other genes within -2Kb TSS and +2Kb TES regions. Metagene regions were from -2Kb TSS to +2Kb TES. In addition, 2Kb upstream TSS and downstream TES regions are grouped into 100 bins (20bp per bin), respectively. The gene body region is grouped into 300 bins (at least 2bp per bin since the minimum gene length is 600bp). In total, each gene is grouped into 500 bins. The average normalized (RPM or spike-in) H3K36me3 signals in each bin were plotted using R package EnrichedHeatmap .
H3K36me3 on exons and introns-Protein-coding genes were selected, and the longest transcript for each gene was chosen. Also, we removed any overlapping transcripts (ignore strand). As a result, 15311 2 4 transcripts were used to calculate the H3K36me3 signal (RPM normalized) distribution on exons/introns. The average exon/intron signal is defined as the total H3K36me3 signals on all exons/introns divided by the total exon/intron length.
RNA-seq analysis-Raw reads were demultiplexed into FASTQ format allowing up to one mismatch using Illumina bcl2fastq2 v2.18. Reads were aligned to the human genome (hg38 and Ensembl 96 gene models) using STAR (version STAR_2.6.1c) (Dobin et al, 2013). TPM expression values were generated using RSEM (version v1.3.0) [5]. edgeR (version 3.24.3 with R 3.5.2) was applied to perform differential expression analysis, using only protein-coding and lncRNA genes (Robinson et al, 2009). To perform differential splicing analysis, we used rMATs (version 4.0.2) with default parameters starting from FASTQ files (Shen et al, 2014). FDR cutoff of 0.05 was used to determine statistical significance.

ACCESSION NUMBERS
The data sets are available in the Gene Expression Omnibus (GEO) database under the accession number GSE147752.