January 28, 2013. Platform presentation on Automated Dicentric Chromosome Identifier Software

“Automating Dicentric Chromosome Detection from Cytogenetic Biodosimetry Data” at  the International EPRBioDose 2013 Conference in Leiden, Netherlands (March 24-28).

Authors: Peter Rogan(1,2), Akila Subasinghe(1), Asanka Wickramasinghe(1), Yanxin Li(1), Jagath Samarabandu(1), Joan Knoll(1,2), Ruth Wilkins(3), Farah Flegal(4); (1)University of Western Ontario, (2)Cytognomix Inc., (3)Health Canada, (4)Atomic Energy of Canada Ltd., Canada.

Abstract:  We are developing a prototype software system with sufficient capacity and speed to estimate radiation exposures by counting dicentric chromosomes in metaphase cells from many individuals in the event of a
mass casualty. Top-ranked metaphase images are segmented by defining chromosomes with an active contour gradient vector field (GVF), and by determining centromere locations along the centerline. The centerline is
extracted by Discrete Curve Evolution (DCE) skeleton branch pruning and curve interpolation. Centromere detection minimizes the global width and DAPI-staining intensity profiles along the centerline. A second
centromere is identified by reapplying this procedure after masking the first. Dicentrics can be identified by applying a support vector machine-based classification, which uses features that capture width and intensity
profile characteristics as well as local shape features of the object contour at candidate pixel locations. The correct location of the centromere is also refined in chromosomes with sister chromatid separation. The
overall algorithm has both high sensitivity (85%) and specificity (94%). Results are independent of the shape and structure of chromosomes in different cells, regardless of which laboratory protocol is followed or the
specimen source. The requisite throughput is being achieved by recoding MATLAB software modules for different segmentation functions in C++/OpenCV, and integrating them in the prototype. Processing of

numerous images is accelerated by both data and task software parallelization with the Message Passaging Interface and Intel Threading Building Blocks as well as an asynchronous non-blocking I/O strategy. Relative
to a serial process, metaphase ranking, GVF, and DCE are respectively 100 and 300 fold faster on an 8-core I7-based desktop and on a 64-core shared memory cluster computer. Extrapolation from these benchmarks to
a 64-core system in which all of the software modules have been integrated indicates that it should be feasible to process metaphases for dicentric chromosomes from 1000 specimens in 20 hours.

January 21, 2013. Paper about Shannon pipeline accepted for publication in Genomics, Proteomics, and Bioinformatics.

Interpretation, stratification and validation of sequence variants affecting mRNA splicing in complete human genome sequences. Genomics, Proteomics, and Bioinformatics, 11:77-85, 2013.

Ben C. Shirley, Eliseos J. Mucaki, Tyson Whitehead, Paul I. Costea, Pelin Akan, Peter K. Rogan.

Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present and validate the Shannon pipeline software to predict mRNA splicing mutations for genome-scale analysis. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform as the Shannon pipeline. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in 3 cancer cell line genomes (U2OS, U251, and A431), and then experimentally validated by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised between 6 to 17 inactivating and 1 to 5 leaky mutations, and 6 to 13 cryptic splicing mutations. Predicted effects were confirmed by RNAseq analysis of U2OS, U251, and A431 cell lines, and expression microarray analysis of SNPs in HapMap cell lines.

 

January 11, 2013. New paper accepted for publication in Nucleic Acids Research

Expanding probe repertoire and improving reproducibility in human genomic hybridization” by S. Dorman, B. Shirley, J. Knoll, and P. Rogan has been accepted for publication by the journal Nucleic Acids Research. In this paper, we use Cytognomix’s patented ab initio sc probe technology to develop and validate a novel classes of FISH probes, genomic microarrays and oligonucleotide pools useful for targeted capture hybridization.

January 4, 2013. Paper: “Predicting mRNA transcript isoforms derived from splicing mutations”, ASSEDA server

Volume 34, Issue 4

“Prediction of mutant mRNA splice isoforms by information theory-based exon definition,” by Eliseos Mucaki, Ben Shirley and Peter Rogan has been accepted for publication by the journal Human Mutation.

Abstract.  Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon skipping isoforms in mRNA produced by splicing mutations from the combined information contents (Ri, which measures binding site affinity) and distribution of the splice sites defining these exons. The total information content of an exon (Ri,total) is the sum of the Ri values of its acceptor and donor splice sites, adjusted for the distance separating these sites, ie. the gap surprisal. Differences between total exon information contents (ΔRi,total) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate non-conforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis server (ASSEDA; http://splice.uwo.ca). Predictions of splicing mutations were highly concordant (85.2%; n=61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.

Update: The paper is now available online from the Journal website: DOI: 10.1002/humu.22277 and is cited on PubMed.

Update 2: John Mucaki has produced a Video Tutorial on using the ASSEDA server on YouTube.

Update 3:  The accepted paper has now been copyedited,  typeset and published online:  http://onlinelibrary.wiley.com/doi/10.1002/humu.22277/abstract. Supplementary data are available as well.  (2-21-2013)

Update 4:  Annual subscriptions to the Automated Splice Site and Exon Definition server are available through Cytognomix  (2-22-2013).

Update 5: The paper has been highlighted in the April 2013 issue of the Journal, where it appeared.  Bing Yu, University of Sydney, authored the commentary (Vol 34[4], page v).

Update 6:  Mucaki EJ., Shirley BC, and Rogan PK. Prediction of Mutant mRNA Splice Isoforms by Information Theory-Based Exon Definition has been published in print. Human Mutation, April 2013, Volume 34 (4), pages 557–565. The journal has made the paper FREE for anyone to download.