Clustered, information-dense transcription factor binding sites identify genes with similar tissue-wide expression profiles. BioRxiv, 2018.
On behalf of the Scientific Programme Committee of the European Conference of Human Genetics 2018 taking place in Milan, Italy from June 16 to June 19, 2018, we are pleased to inform you that the abstract entitled:
‘Comprehensive prediction of responses to chemotherapies by biochemically-inspired machine learning’
(Control No. 2018-A-2095-ESHG)
was among the best scored papers accepted for a poster presentation. Best Poster session takes place on Sunday, June 17, 2018 13:00 hrs, and consists of a 3 minute presentation followed by discussion at your electronic poster.
We have published a new approach to devise gene signatures to detect radiation exposure (human, murine), and to quantify levels of exposure (murine):
Manuscript describing accurate genomic signatures of radiation exposure will be published shortly by F1000Research.
Jonathan ZL Zhao, Eliseos J Mucaki, Peter K Rogan. Predicting Exposure to Ionizing Radiation by Biochemically-Inspired Genomic Machine Learning, F1000Research, in press.
Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches.
Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets.
Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance within individual signatures. Several genes in the signatures we derived are present in previously proposed signatures.
Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.
Ali, S, Li Y, Shirley B, Wilkins R, Flegal F, Rogan PK, Knoll JHM. Population scale biodosimetry with the Automated Dicentric Chromosome Identifier and Dose Estimator (ADCI) software system. [Platform]
Rogan PK, Zhao JZL, and Mucaki EJ. Predicting exposure to ionizing radiation by biochemically-inspired genomic machine learning.[Poster]
Li Y, Shirley B, Wilkins R, Flegal, F, Knoll JHM, Rogan PK. Optimization of image selection in Automated Dicentric Chromosome Analysis. [Poster]
The Logic and Formulation of Exon Definition for Splice and Splicing Regulatory Sites with Negative Information Content. PK Rogan, EJ Mucaki
Update on: Mucaki EJ, 2013 and the Automated Splice Site and Exon Definition Analysis server (ASSEDA).
In Mucaki EJ, 2013, we described a method of predicting the overall strength of an exon by calculating its total information content (Ri,total) from the sum of the Ri values of its donor and acceptor splice sites, adjusted for their gap surprisal (the self-information of the distance between the two sites). Differences between ΔRi,total values are predictive of the relative abundance of these exons in distinct processed mRNAs.
Splice sites altered by mutations that prevent stable interaction with splicesomes are said to be abolished. Information theory predicts abolition of binding below their minimum binding affinity, Ri,minimum, which is empirically derived. This value is slightly above zero bits, the theoretical minimum for binding at equilibrium (ΔG = 0; Schneider TD, 1997). Sites with Ri < 0 are not bound, forming stable interactions would be endergonic (ΔG > 0). This raises the question, when predicting the change in exon strength (ΔRi,total) due to a mutation that inactivates binding, whether mutant sites with varying degrees of negative information content are energetically distinguishable from one another.
The computation of Ri,total contains the sum of the the Ri values of component binding sites, irrespective of their initial or final strengths. Thus, a mutated site with Ri << 0 would result in greater ΔRi,total compared to a site with Ri ~ 0. To assess whether the degree of unfavorable binding should be applied to the exon definition calculation, or if values below 0 bits should be computed similarly to a binding site at equilibrium (Ri ~ 0), we reevaluated experimentally validated natural and regulatory splicing mutations in our paper with both approaches. Ri,total was calculated for 10 variants from Supplementary Table 2, both including and excluding the negative information (ie. Ri < 0 vs. Ri = 0) of inactivated splice sites. Mutation #2 of Supplementary Table 2 [ADA:g.43249658G>A] abolishes a natural donor site, from 8.8 to -9.9 bits. In applying the full decrease in strength (ΔRi,total: -18.7 bits), the natural exon strength decreases from 21.0 to 2.3 bits. When the negative information content is set to zero bits, the change is significantly smaller (21.0 -> 12.2 bits; ΔRi,total = -8.8 bits). When a weak natural splice site is abolished, the difference as expressed as ΔRi,total can be quite small (Mutation #9; -14.8 vs -3.1 bits). In the case of Mutation #38, the reduction in ΔRi,total leads to a partially discordant prediction where the abolished natural exon is weaker than the experimentally confirmed activated cryptic exon. Results for this mutation were concordant with the published version when the negative bit value of the mutated natural site was included in the calculation.
The impact of mutations in splicing regulatory (SR) factors can also be predicted on ASSEDA, where the Ri of the SR binding site is added to the R_i,total, as well as a secondary gap surprisal value for the particular SR protein. These sites can also be abolished. But when a SR protein binding site is no longer active, should the SR gap surprisal still be applied, or is the SR gap surprisal no longer applicable?
We tested mutations from Mucaki EJ, 2013 (Supplementary Table 4), which abolish the splicing enhancer SF2/ASF with and without the SR protein gap surprisal when Ri of the SR site is < 0 bits. The removal of the gap surprisal term for Mutation #2 of Supplementary Table 4 leads to a discordant prediction, where the ΔRi is less than the SR gap surprisal at that distance and therefore the ΔRi,total is positive. As experimental evidence shows an increase in skipping, it is a discordant prediction. Therefore, the gap surprisal is still applied in the computation of both initial and final Ri,total values when the SR protein of interest is abolished as the site is naturally present and therefore expected for binding. Conversely, when we apply the gap surprisal to the initial Ri,total for a splicing factor that is being created, we are essentially applying a penalty for a site that does not normally exist. Therefore, we no longer apply the SR gap surprisal value to the initial Ri,total in these cases.
The revised Ri,total values of SR binding site mutations slightly differ from those reported in Mucaki EJ, 2013 (Supplementary Table 4). This is because the gap surprisal distributions were recomputed for the following factors: SF2/ASF, SC35 and SRp40, with updated versions of these models based on CLIP Seq data (Blin K, 2015, Khorshid M, 2011). This resulted in small changes to the distributions for SF2/ASF and SC35, however changes for SRp40 were significant, and now more closely resembles the other gap surprisal functions. The updated graphs of distance vs. gap surprisal are available at: http://splice.uwo.ca/gapsurprisals.html. While this should not significantly affect ΔRi,total values, it may affect the initial and final Ri,total values.
We would like to alert readers to the fact that information theory-based splicing mutation analysis has been used to analyze a wide range of variants (in/dels and SNVs) that affect splicing in introns and exons in peer reviewed studies. These tools have been used analyze mutations that alter branchpoint recognition and within introns in peer reviewed studies. The Automated Splice Site and Exon Definition Analysis server, ASSEDA (Mucaki EJ, 2013) analyzes mutations at branchpoints, within intronic sequences, at cryptic splice sites, and at splicing regulatory protein binding sites (“enhancer/silencer” sequences). We have also published the Shannon pipeline (Shirley BC, 2013), which carries out mutation analysis affecting splicing (and transcription factor binding sites; Lu R, 2017) on a genome scale. Veridical is software validates splicing mutations found with the Shannon pipeline (or any other program) with RNASeq data from the same individual (Viner C, 2014, Dorman SN, 2014).
Our previous review article extensively describes the use of these tools for splicing mutation analysis by many other research groups, besides ourselves (Caminsky N, 2014).
We have analyzed this mutation with the Automated Splice Site and Exon Definition Analysis server (ASSEDA). The 1 nt deletion in the splice donor of exon 20 reduces the strength of this site from 11.5 -> 4.1 bits. (100/[27.4 bits] = 0.6% binding affinity)
The information theory-based approach used in ASSEDA predicts isoform abundance and computes the fold changes in binding affinity from mutations (Mucaki EJ, 2013), which corresponds to the degree of exon skipping in this case. The reduction in splice site strength is much greater than the estimates given by the ad hoc methods used in the paper. LOH was not complete; some of the observed expression may have been derived from the contaminating normal allele. In fact, had the loss of function in splice site recognition only been 25-40% according to the paper, it could have been classified as a variant of unknown significance, or possibly as benign (as we suggested in Mucaki EJ, 2011).
Rogan PK, Mucaki EJ. Comment on PMID 23169495: Analysis of the effects of rare variants on splicing identifies alterations in GABAA receptor genes in autism spectrum disorder individuals. In: PubMed Commons [Internet]. Bethesda (MD): National Library of Medicine; 2012 Nov 21 [cited 2017 Dec 12].
Regarding GABRQ:c.306G>C: Whereas none of the splicing analysis programs tested predict outcomes shown in the mini-gene construct shown in Figure 2A, information theory-based exon definition analyses using ASSEDA (Mucaki EJ, 2013) was completely concordant. A novel band 116nt longer than the product expected from the wild type exon is observed. The mutation reduces the strength of the natural donor splice site of exon 3 from 9.5 -> 4.5 bits (32 fold). The pre-existing intronic cryptic site 116 nt downstream (8.6 bits) is 17 fold stronger than the mutated splice site. ASSEDA indicates that the total exon information (Ri,total) of wildtype exon is reduced (19.8 -> 14.8 bits) and the corresponding strength of the gap-surprisal adjusted cryptic exon significantly exceeds this (17.7 bits). The wildtype exon is predicted to be ~5-6 fold more abundant than the cryptic exon BEFORE mutation, and the cryptic exon is predicted to be ~8 fold more abundant AFTER mutation.
We have posted a comment in PubMed Commons about Baert et al. “Thorough in silico and in vitro cDNA analysis of 21 putative BRCA1 and BRCA2 splice variants and a complex tandem duplication in BRCA2, allowing the identification of activated cryptic splice donor sites in BRCA2 exon 11.” (2017) (doi: 10.1002/humu.23390). The updated comments can be found at: https://www.ncbi.nlm.nih.gov/pubmed/29280214#comments. They have been highlighted twice by PubMed Commons as a “Top Comment”.
Twenty one BRCA1 and BRCA2 mRNA splice site variants were analyzed by semi-quantitative RT-PCR, with commercial software that scores putative splice sites by ad hoc methods, and with bioinformatic models based on Adaboost and Random Forest, which are general machine learning approaches. The authors cited our review on interpretation of splicing mutations (Caminsky N, 2014), however the analytic approach described in that paper was not evaluated. As an update to our previous BRCA mutation study (Mucaki EJ, 2011), we carried out information theory-based splicing analysis of all potential splicing mutations listed in Supplemental Table S3. The splicing consequences of all variants were accurately predicted by information analysis. We also report results of exon definition-based mRNA splicing mutation analysis (Mucaki EJ, 2013), which infers relative abundance of wild type and mutated splice isoforms from total splicing information content of each prospective exon. Due to length limitations in PubMed Commons commenting system, detailed results for each variant are described in: https://doi.org/10.5281/zenodo.1146708
Also, during our analysis, some inconsistencies in mutation designation or interpretation were noted in the paper: (1) The complex BRCA2duplication described in this article (c.425+415_4780dup[insGATCGCAGTGA]) is sometimes referred to as “c.426-415_4780dup[insGATCGCAGTGA]” (e.g. the title of Figure 5, and Suppl. Table S3), which are not congruent mutations. The true mutation is likely the former, as the Figure 5 legend describes an mRNA splice form that includes 293nt of intron 4. If the duplication was c.426-415_4780dup[insGATCGCAGTGA], the intron inclusion would only be 205nt long. (2) We report an additional inconsistency in regards to Figure 5: The legend of Figure 5E describes a splice form where a truncated exon 11 junctions with the aforementioned 11nt insertion. However, the diagram and the electropherogram in Figure 5e shows exon 11 (ending at c.2398) sharing a junction with the beginning of exon 5. The latter is most likely the correct isoform, as an acceptor is not predicted at the junction between c.4780 and the 11nt insertion.
Mucaki et al. Predicting Response to Platin Chemotherapy Agents with Biochemically-inspired Machine Learning. bioRxiv.
Selection of effective drugs that accurately predict chemotherapy response could improve cancer outcomes. We derive optimized gene signatures for response to common platinum-based drugs, cisplatin, carboplatin, and oxaliplatin, and respectively validate each with bladder, ovarian, and colon cancer patient data. Initially, using breast cancer cell line gene expression and growth inhibition (GI50) data, we performed backwards feature selection with cross-validation to derive predictive gene sets in a supervised support vector machine (SVM) learning approach. These signatures were also verified in bladder cancer cell lines. Aside from published associations between drugs and genes, we also expanded these gene signatures using a systems biology approach. Signatures at different GI50 thresholds distinguishing sensitivity from resistance to each drug contrast the contributions of different genes at extreme vs. median thresholds. An ensemble machine learning technique combining different GI50 thresholds was used to create threshold independent gene signatures. The most accurate models for each platinum drug in cell lines consisted of cisplatin: BARD1, BCL2, BCL2L1, CDKN2C, FAAP24, FEN1, MAP3K1, MAPK13, MAPK3, NFKB1, NFKB2, SLC22A5, SLC31A2, TLR4, TWIST1; carboplatin: AKT1, EIF3K, ERCC1, GNGT1, GSR, MTHFR, NEDD4L, NLRP1, NRAS, RAF1, SGK1, TIGD1, TP53, VEGFB, VEGFC; and oxaliplatin: BRAF, FCGR2A, IGF1, MSH2, NAGK, NFE2L2, NQO1, PANK3, SLC47A1, SLCO1B1, UGT1A1. Recurrence in bladder urothelial carcinoma patients from the Cancer Genome Atlas (TCGA) treated with cisplatin after 18 months was 71% accurate (59% in disease-free patients). In carboplatin-treated ovarian cancer patients, predicted recurrence was 60.2% (61% disease-free) accurate after 4 years, while the oxaliplatin signature predicted disease-free colorectal cancer patients with 72% accuracy (54.5% for recurrence) after 1 year. The best performing cisplatin model best predicted outcome for non-smoking TCGA bladder cancer patients (100% accuracy for recurrent, 57% for disease-free; N=19), the median GI50 model (GI50 = 5.12) predicted outcome in smokers with 79% with recurrence, and 62% who were disease free; N=35). Cisplatin and carboplatin signatures were comprised of overlapping gene sets and GI50 values, which contrasted with models for oxaliplatin response.
MutationForecaster® is now available as a custom analysis service that we provide to you on your data. We’ve listened to you, let us assume the task of performing information theory-based analysis for you. We now offer a Bespoke service that allows you to get fully documented reports based on analysis of variants that you submit to us. Please click on the Learn More link below for more information. Our tools for non-coding variant interpretation utilize an information theory-based approach only available through CytoGnomix. No other service provides the patented, molecular diagnostic information that MutationForecaster® generates.
Collaboration with a French consortium to study non-coding variants in BRCA1 and BRCA2 in patients with a family history of breast and ovarian cancer:
Santana dos Santos, E, Caputo, S.M., Castera, L, Gendrot, M, Briaux, A., Breault, M, Krieger, S, Rogan, P.K, Mucaki, E.J., Bieche, I, Houdayer, C, Vaur, D, Stoppa-Lyonnet, D, Brown, M, Lallemand, F., Rouleau, E. Assessment of functional impact of germline BRCA1/2 variants located in noncoding regions in families with breast and or ovarian cancer predisposition, Breast Cancer Research and Treatment, in press.
The International Atomic Energy Agency (IAEA), within the recently initiated Coordinated Research Project: “Applications of Biological Dosimetry Methods in Radiation Oncology, Nuclear Medicine, Diagnostic and Interventional Radiology” (E35010) will develop clinical applications for biodosimetric techniques, in particular the dicentric assay. Many of the coordinating groups have developed the dicentric assay in their labs, but generally results are interpreted manually.
Clinical applications of biodosimetric techniques will only be routinely applied if less manpower-intensive techniques are employed. The Cytognomix ADCI software system addresses this critical need.
At the first meeting of the Research Coordinators of E35010, all respondents elected to receive evaluation versions of this software:
For immediate release
Ontario company contributes to radiation biodosimetry project at the International Atomic Energy Agency
Cytognomix accelerates estimation of radiation exposure by participating institutions of International Atomic Energy Agency (IAEA) Member States
October 30, 2017 London, Ontario, Canada Cytognomix Inc
Calibration of radiation exposure needs to be accurate for effective cancer treatment. Treatment of radiation overexposures depends on precise measurement of absorbed dose and the type of radiation received. Quantification of radiation exposure by biodosimetry testing needs to be timely for patients to benefit.
The IAEA is committed to encouraging and assisting research on, and development of practical applications of atomic energy for peaceful uses throughout the world. It has extended the opportunity to research institutes in Member States to participate in the Coordinated Research Project (CRP) E35010 entitled ‘Applications of Biological Dosimetry Methods in Radiation Oncology, Nuclear Medicine, and Diagnostic and Interventional Radiology.’
The IAEA is sponsoring CytoGnomix’s Research Project, entitled ‘Determination of Radiation Exposure by Fully Automated Dicentric Chromosome Analysis.’ This project will enable laboratories and research institutions of Member States to use Cytognomix’s technology to accelerate testing radiation exposure.
In this project, Cytognomix Inc. will use its systems to automatically analyze digital images of chromosomes exposed to radiation to estimate exposure. Results obtained by biodosimetry laboratories at Health Canada and Canadian Nuclear Laboratories suggest that the results are similar to traditional manual analyses, but are achieved considerably more quickly. This research will expand access to these systems by other laboratories participating in IAEA’s Coordinated Research Project.
Accuracy and speed of the automated system will be compared with previous results from collaborating CRP laboratories that were obtained by manual or computer-assisted DCA scoring. It is anticipated that cell image data obtained from test samples in prior or current international joint laboratory exercises or independent assay validation activities will be reused in this study. Each collaborating laboratory will also receive a demonstration software version containing their calibration curve and test sample data. Dose estimates obtained by CytoGnomix will be compared with results obtained by collaborators. If the previous results are comparable to those obtained with ADCI in different laboratories, this will establish the feasibility of undertaking larger scale, batch analysis of populations of individuals that have potentially received radiation exposure. A unique aspect of the proposed study will assess whether it enables greater standardization of results obtained by different laboratories, because all labs will use a common algorithm to process their data, while still allowing different labs to customize their own calibration curves for determining unknown radiation exposures, which addresses differences in chromosome preparation methods and radiation calibration sources between labs.
“The IAEA has recognized the critical need for faster approaches to accurately determine radiation exposure that address impending needs by its members. By sponsoring our project, CytoGnomix will have a unique opportunity to provide hands-on experiences to radiation biodosimetry laboratories and centres worldwide.”
Dr. Peter K. Rogan
President of Cytognomix Inc.
- Established in 2009, CytoGnomix Inc. is a biotechnology company that designs and markets advanced genomic reagents and software-based solutions. Its products personalize the diagnosis, evaluation and management of cancer, prenatal disorders and other genetic diseases.
- CytoGnomix’s ADCI software system selects high-quality cells from all types of digital images for analysis, identifies chromosome anomalies, builds biodosimetry calibration curves and estimates exposure in less than an hour.
- In 2017, IAEA is sponsoring 35 Cooperative Research Activities on diverse topics concerning the peaceful use of atomic energy. CRP E35010, which is focused on the biological effects of radiation, is one of 5 projects focused on human health. The decision to award a research contract or agreement is made after careful consideration of the technical merits of the proposal, the compatibility of the project with the IAEA’s own functions and approved programmes, the availability of appropriate facilities and personnel in the institution and previous research work related to the project. Where it is recognized that the award of a particular research or technical contract or research agreement would materially assist one of the IAEA’s programmes, an invitation is sent to those institutions believed to have the necessary facilities and personnel, and the Government of the Member State concerned is kept informed.
Follow us on Facebook
Follow us on Facebook
Corporate Communications, CytoGnomix
PgmNr 182: Splicing mutation risk analysis in hereditary breast and ovarian cancer exomes. (Platform)
Thurs, Oct 19. 11:00am -12:30pm. Session 40. Defining High Risk in Cancer. Room 230C – Level 2/Orlando Convention Center
E.J. Mucaki 1; B.C. Shirley 2; S.N. Dorman 1; P.K. Rogan 1,2 1) Biochemistry, University of Western Ontario, London, Ontario, Canada; 2) CytoGnomix Inc, London, Ontario, Canada
Genetic testing of patients with inherited cancer frequently reveals variants of unknown significance (VUS). We have presented an Information Theory (IT) framework to predict and prioritize coding and non-coding VUS in hereditary breast and ovarian cancer (BRCA) patients, including effects on mRNA splicing1,2. We investigated the exome wide distribution of predicted mRNA splicing mutations in a large BRCA cohort. Predicted splicing mutations in IT-based splicing analysis of all variant data from AmbryShare BRCA exome (n=11,416; with 1.2 million VUS) and the control genome Aggregation Databases (gnomAD; n=138,632) were identified using the Shannon splicing mutation software pipeline3. IT-flagged variant frequencies (decreasing Ri values [in bits] of either leaky or inactivated natural splice sites [∆Ri >4 bits and Ri ≤ 1.6] or strengthened cryptic splices sites with an Ri exceeding that of adjacent natural sites) were compared for each gene using odds ratios (OR). ORA is defined as the ratio of frequencies of the same flagged variants in a gene in AmbryShare relative to gnomAD. ORP is based on the ratio of frequencies of all flagged variants in a gene in AmbryShare relative to all flagged variants in that gene in gnomAD. A greater number of IT-flagged variants were present in AmbryShare than in gnomAD among 2012 genes with severe splicing mutations. Increasing the ∆Ri threshold disproportionally decreases the number of flagged variants in gnomAD due to fewer severe splicing mutations. Variants that abolish natural splice sites flagged known inherited breast cancer genes with respectively increased ORA and ORP inATM (493, 407), BARD1 (407, 407), BRCA1 (19, 14), BRCA2 (54, 54),CDH1 (549, 549), MLH1 (303, 303), MUTYH (95, 11), and PALB2 (233, 116). Other flagged breast cancer-related genes with high OR includeAAMP, C1QTNF6, CDK3, FOLR1, PRLR, RAD50, RING1, S100A2, SRGN,TMSB10, TYRO3, and VIM. Notable highly mutated genes from other cancers include GKN1 (gastric), C1orf61 (hepatocellular), CREM(prostate), PNKP (multiple), PPP1CA (gastric) and ZFAND2B (myeloid). Flagged genes not known to be linked to cancer include ATP1A4, MFF,PACSIN1, PTS, and USH1C. Severe splicing mutations occur more frequently in inherited and somatic breast cancer genes as well as in other genes in BRCA populations.
1Mucaki et al. BMC Med. Genom. 9:19, 2016; 2Caminsky et al. Hum. Mut. 37:640, 2016; 3Shirley et al. Genom. Prot. Bioinf. 11:75, 2013. Keywords: Cancer; Bioinformatics; Genomics; Population genetics; Statistical genetics
PgmNr 1268/T: Accurate radiation biodosimetry through automation of metaphase cell image selection and chromosome segmentation. (Poster)Thurs, Oct 19. 2:00pm – 4:00pm. Bioinformatics and Computational Approaches. Exhibit Hall, Level 1, Orlando Convention Center
Y. Li 1; J. Liu 2; B. Shirley 1; R. Wilkins 3; F. Flegal 4; J.H.M. Knoll 1,2; P.K. Rogan 1,2 1) CytoGnomix Inc, London, Ontario, Canada; 2) University of Western Ontario, London, Ontario Canada; 3) Health Canada, Ottawa, Ontario, Canada; 4) Canadian Nuclear Laboratories, Chalk River, Ontario, Canada
The dicentric chromosome (DC) assay is a standardized method that is recommended for determination of biologic radiation exposure1,2. Software to fully automate this assay has been developed in our laboratory3. This method relies on high quality microscope-derived images of metaphase cells to reduce the rate of false positive (FP) DCs. We present image processing methods to eliminate suboptimal metaphase cell images based on novel quality measures and to reclassify FPs by analyzing their morphological features. A set of chromosome segmentation thresholds selectively filtered out FPs, arising primarily from extended prometaphase chromosomes, sister chromatid separation and chromosome fragmentation. This reduced the number of FPs by 55% and was highly specific to the abnormal structures (≥97.7%). Image segmentation filters selectively remove images with consistently unparsable or incorrectly segmented chromosome morphologies, while image ranking sorts images according to their qualities and enables selection of optimal images in samples. Overall, these methods can eliminate at least half of the FPs detected by manual image review. By processing data to derive calibration curves and to assess samples of unknown exposures with the same image selection models, average dose estimation errors were reduced from 0.6 Gy to 0.3 Gy, without requiring manual review of DCs. During this presentation, we will use our software to demonstrate that metaphase image filtering and object selection constitute a reliable and scalable approach for biodosimetry, resulting in more accurate radiation dose estimates.
1. International Atomic Energy Agency. (2001) Cytogenetic Analysis for Radiation Dose Assessment, a Manual: Technical Reports Series. No. 405, International Atomic Energy Agency, Vienna.
2. International Atomic Energy Agency. (2011) Cytogenetic Dosimetry: Applications in Preparedness for and Response to Radiation Emergencies, International Atomic Energy Agency, Vienna.
3. Rogan, P. K., Li, Y., Wilkins, R. C., Flegal, F. N., and Knoll, J. H. M. (2016) Radiation Dose Estimation by Automated Cytogenetic Biodosimetry, Radiation Protection Dosimetry 172, 207-217.
Keywords: Bioinformatics; Centromere structure/function; Chromosomal abnormalities; Diagnostics; Public health
PgmNr 1288/W: Predicting exposure to ionizing radiation by biochemically-inspired genomic machine learning. (Poster)Wed, Oct 18. 3:00pm – 4:00pm. Bioinformatics and Computational Approaches. Exhibit Hall, Level 1, Orlando Convention Center.
Analyzing gene expression in peripheral blood mononuclear cells reveals profiles that predict radiation exposure in humans and mice by logistic regression (PLoS Med. 4:e106; PLoS ONE. 3:e1912). Using biochemically-inspired methods (Mol. Onc. 10:85-100), we derive gene signatures to predict the level of radiation exposure with improved accuracies. DNA repair genes responsive or differentially expressed upon radiation exposure and orthologs highly expressed in species resilient to radiation exposure (n=998) were analyzed by two-sampled t-tests comparing expression in individuals unexposed and exposed to radiation (150-200 cGy: humans or 50-1000 cGy: mice). Significance thresholds for including a gene in developing a signature were adjusted based on radiation dose, from p < 0.01 (50 cGy) to < 1E-14 (1000 cGy), equivalent to ~10% of genes. Support Vector Machine (SVM) signatures were derived by backward feature selection (BFS) or minimum-redundancy-maximum-relevance (mRMR) and validated using leave-one-out cross validation (LOOCV) and external datasets. GEO datasets GSE6874 and GSE10640 were used for training and testing. Signatures derived by BFS from the human patients of GSE6874 (n=78) included α) GADD45A, GTF3A, TNFRSF4, XPC and β) ATR, GADD45A, GTF3A, IL2RB, MYC, NEIL2, RBM15, SERPINB1, XPC, which both distinguished irradiated from unirradiated individuals with 98% sensitivity and 100% specificity in LOOCV. Validating these signatures on the human patients of GSE10640 (n=71) confirms that α and β are both 92% sensitive and, respectively, 94% and 96% specific. mRMR found the 10 “best” genes from the murine samples of GSE10640 (n=104) to create a signature at each radiation dose; several genes were common among signatures. Signature δ (50 cGy) included PHLDA3, BAX, NBN, CCT3, CDKN1A, CCNG1, POLK, ERCC5, GCDH, and RAMP1. Signature ε (200 cGy) included PHLDA3, LIMD1, CCT3, BAX, MS4A1, GLIPR2, BLNK, BCAR3, CDKN1A, andTFAM. Signature ζ (1000 cGy) included CCT3, SUCLG2, EI24, CNBP, PHLDA3, TPST1, HEXB, FEN1, CDKN1A, and BLNK. When validated on the murine samples of GSE6874 (n=14), each signature correctly predicted the exposure status of all mice. Our approach produces signatures with higher accuracies in cross- and external validation datasets than prior logistic regression models, with significantly improved sensitivities in detecting radiation exposure in humans. This will be useful in identifying nearly all radiation-exposed individuals in a mass casualty.
Keywords: Bioinformatics; Diagnostics; Transcriptome; Computational tools; Hematopoietic system
Hannouf MB, Winquist E, Mahmud SM, Brackstone M, Sarma S, Rodrigues G, Rogan PK, Hoch JS, Zaric GS. The clinical and economic impact of primary tumour identification in metastatic cancer of unknown primary tumour: a population-based retrospective matched cohort study, PharmacoEconomics, 2017 (doi:10.1007/s41669-017-0051-2) Link: pdf
(click on article)