February 19, 2018. Article on genomic signature of radiation exposure

Manuscript describing accurate genomic signatures  of radiation exposure will be published shortly by F1000Research.

Jonathan ZL Zhao, Eliseos J Mucaki, Peter K  Rogan. Predicting Exposure to Ionizing Radiation by Biochemically-Inspired Genomic Machine Learning, F1000Research, in press.

Abstract:

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches.

Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets.

Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance within individual signatures. Several genes in the signatures we derived are present in previously proposed signatures.

Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

February 7, 2018. Accepted presentations at EPR Biodose (Munich, June, 2018)

Ali, S, Li Y, Shirley B, Wilkins R, Flegal F, Rogan PK, Knoll JHM. Population scale biodosimetry with the Automated Dicentric Chromosome Identifier and Dose Estimator (ADCI) software system. [Platform]

Rogan PK, Zhao JZL, and Mucaki EJ. Predicting exposure to ionizing radiation by biochemically-inspired genomic machine learning.[Poster]

Li Y, Shirley B, Wilkins R, Flegal, F, Knoll JHM, Rogan PK. Optimization of image selection in Automated Dicentric Chromosome Analysis. [Poster]

EPR

 

 

 

 

 

 

 

 

 

 

 

 

 

May 4, 2015. Comment on PMID 23348723. Prediction of mutant mRNA splice isoforms by information theory-based exon definition.

Peter Rogan 2015 May 04 6:14 p.m.

The Logic and Formulation of Exon Definition for Splice and Splicing Regulatory Sites with Negative Information Content. PK Rogan, EJ Mucaki

Update on: Mucaki EJ, 2013 and the Automated Splice Site and Exon Definition Analysis server (ASSEDA).

In Mucaki EJ, 2013, we described a method of predicting the overall strength of an exon by calculating its total information content (Ri,total) from the sum of the Ri values of its donor and acceptor splice sites, adjusted for their gap surprisal (the self-information of the distance between the two sites). Differences between ΔRi,total values are predictive of the relative abundance of these exons in distinct processed mRNAs.

Splice sites altered by mutations that prevent stable interaction with splicesomes are said to be abolished. Information theory predicts abolition of binding below their minimum binding affinity, Ri,minimum, which is empirically derived. This value is slightly above zero bits, the theoretical minimum for binding at equilibrium (ΔG = 0; Schneider TD, 1997). Sites with Ri < 0 are not bound, forming stable interactions would be endergonic (ΔG > 0). This raises the question, when predicting the change in exon strength (ΔRi,total) due to a mutation that inactivates binding, whether mutant sites with varying degrees of negative information content are energetically distinguishable from one another.

The computation of Ri,total contains the sum of the the Ri values of component binding sites, irrespective of their initial or final strengths. Thus, a mutated site with Ri << 0 would result in greater ΔRi,total compared to a site with Ri ~ 0. To assess whether the degree of unfavorable binding should be applied to the exon definition calculation, or if values below 0 bits should be computed similarly to a binding site at equilibrium (Ri ~ 0), we reevaluated experimentally validated natural and regulatory splicing mutations in our paper with both approaches. Ri,total was calculated for 10 variants from Supplementary Table 2, both including and excluding the negative information (ie. Ri < 0 vs. Ri = 0) of inactivated splice sites. Mutation #2 of Supplementary Table 2 [ADA:g.43249658G>A] abolishes a natural donor site, from 8.8 to -9.9 bits. In applying the full decrease in strength (ΔRi,total: -18.7 bits), the natural exon strength decreases from 21.0 to 2.3 bits. When the negative information content is set to zero bits, the change is significantly smaller (21.0 -> 12.2 bits; ΔRi,total = -8.8 bits). When a weak natural splice site is abolished, the difference as expressed as ΔRi,total can be quite small (Mutation #9; -14.8 vs -3.1 bits). In the case of Mutation #38, the reduction in ΔRi,total leads to a partially discordant prediction where the abolished natural exon is weaker than the experimentally confirmed activated cryptic exon. Results for this mutation were concordant with the published version when the negative bit value of the mutated natural site was included in the calculation.

The impact of mutations in splicing regulatory (SR) factors can also be predicted on ASSEDA, where the Ri of the SR binding site is added to the R_i,total, as well as a secondary gap surprisal value for the particular SR protein. These sites can also be abolished. But when a SR protein binding site is no longer active, should the SR gap surprisal still be applied, or is the SR gap surprisal no longer applicable?

We tested mutations from Mucaki EJ, 2013 (Supplementary Table 4), which abolish the splicing enhancer SF2/ASF with and without the SR protein gap surprisal when Ri of the SR site is < 0 bits. The removal of the gap surprisal term for Mutation #2 of Supplementary Table 4 leads to a discordant prediction, where the ΔRi is less than the SR gap surprisal at that distance and therefore the ΔRi,total is positive. As experimental evidence shows an increase in skipping, it is a discordant prediction. Therefore, the gap surprisal is still applied in the computation of both initial and final Ri,total values when the SR protein of interest is abolished as the site is naturally present and therefore expected for binding. Conversely, when we apply the gap surprisal to the initial Ri,total for a splicing factor that is being created, we are essentially applying a penalty for a site that does not normally exist. Therefore, we no longer apply the SR gap surprisal value to the initial Ri,total in these cases.

The revised Ri,total values of SR binding site mutations slightly differ from those reported in Mucaki EJ, 2013 (Supplementary Table 4). This is because the gap surprisal distributions were recomputed for the following factors: SF2/ASF, SC35 and SRp40, with updated versions of these models based on CLIP Seq data (Blin K, 2015Khorshid M, 2011). This resulted in small changes to the distributions for SF2/ASF and SC35, however changes for SRp40 were significant, and now more closely resembles the other gap surprisal functions. The updated graphs of distance vs. gap surprisal are available at: http://splice.uwo.ca/gapsurprisals.html. While this should not significantly affect ΔRi,total values, it may affect the initial and final Ri,total values.

Oct. 1, 2017. Comment on PubMed PMID 28949076: Rules and tools to predict the splicing effects of exonic and intronic mutations. In: PubMed Commons [Internet]. Bethesda (MD): National Library of Medicine; 2017 Sep 26

Peter Rogan2017 Oct 01 8:57 p.m.

We would like to alert readers to the fact that information theory-based splicing mutation analysis has been used to analyze a wide range of variants (in/dels and SNVs) that affect splicing in introns and exons in peer reviewed studies. These tools have been used analyze mutations that alter branchpoint recognition and within introns in peer reviewed studies. The Automated Splice Site and Exon Definition Analysis server, ASSEDA (Mucaki EJ, 2013) analyzes mutations at branchpoints, within intronic sequences, at cryptic splice sites, and at splicing regulatory protein binding sites (“enhancer/silencer” sequences). We have also published the Shannon pipeline (Shirley BC, 2013), which carries out mutation analysis affecting splicing (and transcription factor binding sites; Lu R, 2017) on a genome scale. Veridical is software validates splicing mutations found with the Shannon pipeline (or any other program) with RNASeq data from the same individual (Viner C, 2014Dorman SN, 2014).

Our previous review article extensively describes the use of these tools for splicing mutation analysis by many other research groups, besides ourselves (Caminsky N, 2014).

Dec. 7, 2017. Rogan PK, Mucaki EJ. Comment on PMID 29185120: Characterization of a novel germline BRCA1 splice variant, c.5332+4delA. In: PubMed Commons [Internet]. Bethesda (MD): National Library of Medicine; 2017 Nov 28 [cited 2017 Dec 7].

Peter Rogan2017 Dec 07 5:24 p.m.

We have analyzed this mutation with the Automated Splice Site and Exon Definition Analysis server (ASSEDA). The 1 nt deletion in the splice donor of exon 20 reduces the strength of this site from 11.5 -> 4.1 bits. (100/[27.4 bits] = 0.6% binding affinity)

The information theory-based approach used in ASSEDA predicts isoform abundance and computes the fold changes in binding affinity from mutations (Mucaki EJ, 2013), which corresponds to the degree of exon skipping in this case. The reduction in splice site strength is much greater than the estimates given by the ad hoc methods used in the paper. LOH was not complete; some of the observed expression may have been derived from the contaminating normal allele. In fact, had the loss of function in splice site recognition only been 25-40% according to the paper, it could have been classified as a variant of unknown significance, or possibly as benign (as we suggested in Mucaki EJ, 2011).

Dec 12, 2017. Comment on PubMed PMID 23169495: Analysis of the effects of rare variants on splicing identifies alterations in GABAA receptor genes in autism spectrum disorder individuals.

Rogan PK, Mucaki EJ. Comment on PMID 23169495: Analysis of the effects of rare variants on splicing identifies alterations in GABAA receptor genes in autism spectrum disorder individuals. In: PubMed Commons [Internet]. Bethesda (MD): National Library of Medicine; 2012 Nov 21 [cited 2017 Dec 12].

Peter Rogan2017 Dec 12 09:53 a.m

Regarding GABRQ:c.306G>C: Whereas none of the splicing analysis programs tested predict outcomes shown in the mini-gene construct shown in Figure 2A, information theory-based exon definition analyses using ASSEDA (Mucaki EJ, 2013) was completely concordant. A novel band 116nt longer than the product expected from the wild type exon is observed. The mutation reduces the strength of the natural donor splice site of exon 3 from 9.5 -> 4.5 bits (32 fold). The pre-existing intronic cryptic site 116 nt downstream (8.6 bits) is 17 fold stronger than the mutated splice site. ASSEDA indicates that the total exon information (Ri,total) of wildtype exon is reduced (19.8 -> 14.8 bits) and the corresponding strength of the gap-surprisal adjusted cryptic exon significantly exceeds this (17.7 bits). The wildtype exon is predicted to be ~5-6 fold more abundant than the cryptic exon BEFORE mutation, and the cryptic exon is predicted to be ~8 fold more abundant AFTER mutation.