RefSeq promoter annotations are over-represented for those with low-CG content compared with promoters used
. We reproduced a prior approach to categorize promoters based on observed/expected CG dinucleotide frequencies , showing the bimodal distribution (gray) that allows categorization into low-CG (LCG) and high-CG (HCG) groups. ChIP-seq to localize RNAPII-Ser5 (P) was used to define promoters used in vivo in unexposed, control HEK 293 T cells. The same bimodal CG dinucleotide distribution was observed (blue) but with a substantially lower proportion of LCG promoters, suggesting that the RefSeq annotation may include some inaccurate promoter predictions within this LCG subcategory. CG, cytosine-guanine dinucleotide; RNAPII-Ser5 (P), serine 5-phosphorylated RNA polymerase II.