January 28, 2013. Platform presentation on Automated Dicentric Chromosome Identifier Software

“Automating Dicentric Chromosome Detection from Cytogenetic Biodosimetry Data” at  the International EPRBioDose 2013 Conference in Leiden, Netherlands (March 24-28).

Authors: Peter Rogan(1,2), Akila Subasinghe(1), Asanka Wickramasinghe(1), Yanxin Li(1), Jagath Samarabandu(1), Joan Knoll(1,2), Ruth Wilkins(3), Farah Flegal(4); (1)University of Western Ontario, (2)Cytognomix Inc., (3)Health Canada, (4)Atomic Energy of Canada Ltd., Canada.

Abstract:  We are developing a prototype software system with sufficient capacity and speed to estimate radiation exposures by counting dicentric chromosomes in metaphase cells from many individuals in the event of a
mass casualty. Top-ranked metaphase images are segmented by defining chromosomes with an active contour gradient vector field (GVF), and by determining centromere locations along the centerline. The centerline is
extracted by Discrete Curve Evolution (DCE) skeleton branch pruning and curve interpolation. Centromere detection minimizes the global width and DAPI-staining intensity profiles along the centerline. A second
centromere is identified by reapplying this procedure after masking the first. Dicentrics can be identified by applying a support vector machine-based classification, which uses features that capture width and intensity
profile characteristics as well as local shape features of the object contour at candidate pixel locations. The correct location of the centromere is also refined in chromosomes with sister chromatid separation. The
overall algorithm has both high sensitivity (85%) and specificity (94%). Results are independent of the shape and structure of chromosomes in different cells, regardless of which laboratory protocol is followed or the
specimen source. The requisite throughput is being achieved by recoding MATLAB software modules for different segmentation functions in C++/OpenCV, and integrating them in the prototype. Processing of

numerous images is accelerated by both data and task software parallelization with the Message Passaging Interface and Intel Threading Building Blocks as well as an asynchronous non-blocking I/O strategy. Relative
to a serial process, metaphase ranking, GVF, and DCE are respectively 100 and 300 fold faster on an 8-core I7-based desktop and on a 64-core shared memory cluster computer. Extrapolation from these benchmarks to
a 64-core system in which all of the software modules have been integrated indicates that it should be feasible to process metaphases for dicentric chromosomes from 1000 specimens in 20 hours.

January 21, 2013. Paper about Shannon pipeline accepted for publication in Genomics, Proteomics, and Bioinformatics.

Interpretation, stratification and validation of sequence variants affecting mRNA splicing in complete human genome sequences. Genomics, Proteomics, and Bioinformatics, 11:77-85, 2013.

Ben C. Shirley, Eliseos J. Mucaki, Tyson Whitehead, Paul I. Costea, Pelin Akan, Peter K. Rogan.

Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present and validate the Shannon pipeline software to predict mRNA splicing mutations for genome-scale analysis. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform as the Shannon pipeline. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in 3 cancer cell line genomes (U2OS, U251, and A431), and then experimentally validated by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised between 6 to 17 inactivating and 1 to 5 leaky mutations, and 6 to 13 cryptic splicing mutations. Predicted effects were confirmed by RNAseq analysis of U2OS, U251, and A431 cell lines, and expression microarray analysis of SNPs in HapMap cell lines.

 

January 11, 2013. New paper accepted for publication in Nucleic Acids Research

Expanding probe repertoire and improving reproducibility in human genomic hybridization” by S. Dorman, B. Shirley, J. Knoll, and P. Rogan has been accepted for publication by the journal Nucleic Acids Research. In this paper, we use Cytognomix’s patented ab initio sc probe technology to develop and validate a novel classes of FISH probes, genomic microarrays and oligonucleotide pools useful for targeted capture hybridization.

January 4, 2013. Paper: “Predicting mRNA transcript isoforms derived from splicing mutations”, ASSEDA server

Volume 34, Issue 4

“Prediction of mutant mRNA splice isoforms by information theory-based exon definition,” by Eliseos Mucaki, Ben Shirley and Peter Rogan has been accepted for publication by the journal Human Mutation.

Abstract.  Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon skipping isoforms in mRNA produced by splicing mutations from the combined information contents (Ri, which measures binding site affinity) and distribution of the splice sites defining these exons. The total information content of an exon (Ri,total) is the sum of the Ri values of its acceptor and donor splice sites, adjusted for the distance separating these sites, ie. the gap surprisal. Differences between total exon information contents (ΔRi,total) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate non-conforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis server (ASSEDA; http://splice.uwo.ca). Predictions of splicing mutations were highly concordant (85.2%; n=61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.

Update: The paper is now available online from the Journal website: DOI: 10.1002/humu.22277 and is cited on PubMed.

Update 2: John Mucaki has produced a Video Tutorial on using the ASSEDA server on YouTube.

Update 3:  The accepted paper has now been copyedited,  typeset and published online:  http://onlinelibrary.wiley.com/doi/10.1002/humu.22277/abstract. Supplementary data are available as well.  (2-21-2013)

Update 4:  Annual subscriptions to the Automated Splice Site and Exon Definition server are available through Cytognomix  (2-22-2013).

Update 5: The paper has been highlighted in the April 2013 issue of the Journal, where it appeared.  Bing Yu, University of Sydney, authored the commentary (Vol 34[4], page v).

Update 6:  Mucaki EJ., Shirley BC, and Rogan PK. Prediction of Mutant mRNA Splice Isoforms by Information Theory-Based Exon Definition has been published in print. Human Mutation, April 2013, Volume 34 (4), pages 557–565. The journal has made the paper FREE for anyone to download.

November 7, 2012. Presentation at 2012 meeting of the American Society of Human Genetics.

Strategy for Identification, Prediction, and Prioritization of Non-Coding Variants of Uncertain Significance in Heritable Breast Cancer
P. K. Rogan1,2,4, E. J. Mucaki1, A. Stuart3, N. Bryans2, E. Dovigi1, B. C. Shirley2, C. Viner2, J. H. Knoll3,4, P. Ainsworth4. Departments of Biochemistry1, Computer Science2, and Pathology3 Western University, and Cytognomix Inc4,  London, ON N6A 2C1 Canada.

Poster presentation

High-throughput sequencing (HTS) of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Some non-coding sequence variants have been proven to significantly contribute to the phenotypes of high penetrance disorders. We develop an approach to predict pathogenicity of non-coding VUS based on comprehensive information analysis of changes in DNA and RNA sequences bound by regulatory factors. Using cleavable solution microarrays, we are capturing and enriching for non-coding variants in genes known to harbor mutations that increase breast cancer risk. Oligo baits covering ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2 and TP53 were synthesized for solution hybridization with a custom cleavable microarray spanning the complete coding and intergenic regions 10 kb upstream and downstream of each gene. Non-exonic sequences are densely populated with repetitive sequences that can affect short read assembly. A novel probe design method was used to capture both repeat-free and divergent repeat sequences that are effectively single copy. After SBS sequencing of 13 patient samples in our laboratory, information theory-based sequence analysis was used to prioritize non-coding variants which occurred within sequence elements recognized by proteins or protein complexes. The novel VUS identified are being investigated for effects on mRNA splicing, transcription factor-binding site (TFBS), and untranslated region (UTR) mutations. We have developed and apply information theory based models for exon recognition, which predict the relative abundance of natural, cryptic, and mutant splice isoforms resulting from predicted mutations using the combined donor and acceptor site strengths of each mRNA species. We have applied a similar approach to detect mutations in the promoters of BRCA1 and BRCA2 that alter strengths of TFBS. Information weight matrices were automatically computed by entropy minimization of ATF3, BATF, BCL3, BCLAF, c-Jun, c-Myc, CTCF, EGR1, EP300, ETS1, FOSL2, FOXA1, FOXM1, GABP, GATA3, GRP20, HSF1, IRF4, MEF2A, NFIC, NFkB, PU.1, RAD21, RXRA, TCF12, TCF7L2, and YY1 TFBS from the global set of ENCODE ChiP-seq regions embedded within DNAse I hypersensitive domains. These models were then used to evaluate novel variants discovered by sequence analysis of breast cancer patients for alteration the TFBS binding strengths. This strategy more comprehensively covers non-coding regions in breast cancer genes than repeat masking, and introduces a unified framework for systematic interpretation of VUS that may affect expression.

October 17, 2012. Automating cytogenetic biodosimetry

Cytognomix has established  partnerships to develop advanced high throughput software and microscope systems to automate cytogenetic biodosimetry in the event of a mass casualty radiation event. Biodosimetry laboratory partners include Health Canada and Atomic Energy of Canada Ltd. Software development for image processing of chromosome images is being carried out at the Schools of Engineering and Schulich School of Medicine and Dentistry, Western University, Ontario CA. We are providing software to Huron Technologies International, Waterloo CA to accelerate the detection of  dicentric metaphase chromosomes using a modification of their MACROscope system. The project has been supported by a Western Innovation Grant, a CMCR pilot project (NIH subcontract from Dartmouth University), and by the Federal Development Agency of Southern Ontario.

Read this description of our project with Huron Technologies:   ApplicationNote_cytogenetic_biodosimetry

June 26, 2012. New US Patent

FISH of short cancer related genesUS Patent 8,209,129 has issued on single copy DNA probes which include divergent repetitive sequences, thus significantly extending the portions of the genome that can be used for such probes beyond traditional single copy sequences.

The technology also increases the density of genomic DNA probes for higher resolution genetic analysis beyond what is used in FISH, genomic microarrays for array comparative genomic hybridization, and solution capture hybrdization arrays for sequence enrichment in deep sequencing. It is licensed to Cytognomix.

April 24, 2012. Mutation interpretation in deep sequencing data

Cytognomix’s new software product, the Shannon pipeline for human splicing mutation analysis of NGS sequencing data was presented at the BioIT World Conference and Expo (Apr 24-26, 2012), Boston.

Dr. Rogan describes our poster:

Large scale interpretation and stratification of non-coding sequence variants in the human genome. Ben Shirley1,Eliseos Mucaki2, & Peter Rogan1,2,3. Depts of Computer Science1 and Biochemistry2, and Cytognomix Inc3, London ON, Canada.