About 1% of the genome consists of 500-2000 bp CpG-rich areas or islands. About half of all CpG islands correspond to transcription start sites and promoters of expressed genes. In active genes these regions tend to be unmethylated. In diseases such as cancer these regions can become hyper methylated resulting in abnormal silencing of a specific gene. While methylation of promoter regions is correlated with gene repression, methylation of the gene body is correlated with expression.
DNA methylation occurs mainly at the 5’ position of cytosine bases and is mediated by enzymes called DNA methyltransferases (DNMT). 5-methylcytosine can be further modified to 5-hydroxymethylcytosine (5-hmC) by the family of Ten-Eleven Translocation (TET1-3) enzymes. These enzymes appear to convert a methylated cytosine to an unmethylated cytosine by converting 5-hmC to 5-formylcytosine (5fC), then 5-carboxylcytosine (5caC). In this process 5fC and 5caC can be converted to an unmodified cytosine by Terminal deoxynucleotidyl transferase (TdT). The 5hmC and TET enzymes may be involved in tumorigenesis, and are therefore key targets for epigenetics research, to fully elucidate the dynamic changes in the epigenome involved in development and disease.
Regions of methylated DNA can be bound by a variety of methylated DNA binding proteins (MBD). The presence or absence of these proteins is believed to resulting recruitment of specific histones and other chromatin associated proteins to either activate or repress gene expression.