Interpretation, stratification and validation of sequence variants affecting mRNA splicing in complete human genome sequences. Genomics, Proteomics, and Bioinformatics, 11:77-85, 2013.
Ben C. Shirley, Eliseos J. Mucaki, Tyson Whitehead, Paul I. Costea, Pelin Akan, Peter K. Rogan.
Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present and validate the Shannon pipeline software to predict mRNA splicing mutations for genome-scale analysis. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform as the Shannon pipeline. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in 3 cancer cell line genomes (U2OS, U251, and A431), and then experimentally validated by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised between 6 to 17 inactivating and 1 to 5 leaky mutations, and 6 to 13 cryptic splicing mutations. Predicted effects were confirmed by RNAseq analysis of U2OS, U251, and A431 cell lines, and expression microarray analysis of SNPs in HapMap cell lines.