Molecular, Cell and Cancer Biology Department
First Thesis Advisor
Nathan Lawson, PhD
Bayes Theorem, Algorithms, Polyadenylation, RNA 3' Polyadenylation Signals, High-Throughput Nucleotide Sequencing
Dissertations, UMMS; Bayes Theorem; Algorithms; Polyadenylation; RNA 3' Polyadenylation Signals; High-Throughput Nucleotide Sequencing
Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated region of a transcript, which is important for regulation by microRNAs and RNA binding proteins. Additionally, these sites have generally been poorly annotated. To identify 3’ ends, many techniques utilize an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Previously, simple heuristic filters relying on the number of adenines in the genomic sequence downstream of a putative polyadenylation site have been used to remove these sites of internal priming. However, these simple filters may not remove all sites of internal priming and may also exclude true polyadenylation sites. Therefore, I developed a naïve Bayes classifier to identify putative sites from oligo-dT primed 3’ end deep sequencing as true or false/internally primed. Notably, this algorithm uses a combination of sequence elements to distinguish between true and false sites. Finally, the resulting algorithm is highly accurate in multiple model systems and facilitates identification of novel polyadenylation sites.
Sheppard, SE. Application of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation. (2013). University of Massachusetts Medical School. GSBS Dissertations and Theses. Paper 653. DOI: 10.13028/M20K68. http://escholarship.umassmed.edu/gsbs_diss/653
Rights and Permissions
Copyright is held by the author, with all rights reserved.