AI-based Mutation Predictions in SARS-CoV-2
Problem Formulation
SARS-CoV-2 has emerged in November 2019 and since then the world has been suffering from covid-19 pandemic which is hard to contain because of its frequently occurring mutations and complex mechanism of action. The current outbreak has become one of the biggest threats to the global economy and financial market since World War II. The frequent spread of the SARS-CoV-2 and complex mechanism of action, makes it hard to cope up with the infection of virus. SARS-CoV-2 is considered very prone to mutations as suggested by the stats from Gisaid (A tool for real-time tracking of pathogen evolution). SARS-CoV-2 is undergoing 25.2 substitutions per month, which may cause the shortcoming of effectively planned medications. The World Health Organization (WHO) has reported 5775 distinct variants for SARS-CoV-2 by analysing the viral genome from USA, UK, Australia and Northern Ireland. Certain SARS-COV-2 vaccines are available in the market, these vaccines can provide short term protection but they cannot assure safety from future variations in SARS-CoV-2 genome. So it has become pivotal to be well prepared against such mutations which may serve as a menace.
Solution Approach
These issues can be tackled by the use of machine learning and deep learning based models. These models have already been proven very successful in mutation predictions of Influenza virus. In our projects, deep learning based models are being used to achieve the task of mutation prediction in the genome of SARS-CoV-2, under the spectrum of AIMPID. AIMPID is focused on predicting mutations in SARS-CoV-2 genome comprising of structural and non-structural proteins and analyze their significance on the basis of their position. This is achieved by developing deep neural network models like recurrent neural networks (RNNs), encoder-decoder architectures and deep seq2seq models. These models are tested, optimized, and evaluated using the available sequence dataset. This project is being carried out with collaboration of Eisbach Bio Gmbh which will be developing small inhibitor molecules to target the virulent protein products of predicted mutations to block their activities. This may help in controlling the spread of pandemic as frequent mutations in SARS-CoV-2 genome (especially in potential vaccine targets) are occurring, rendering the developed medications useless. Our research focuses on the prediction and rate calculation of mutations that are most likely to occur in the least mutated proteins which are crucial to viral activities inside the host and to produce such drugs that are long lasting and effective.
Although RNNs are quite effective in predicting the mutations, they still lack the ability to biologically explain why and how these mutations are favorable to virus and if so in what sense. To be able to answer such evolutionary questions we are interested in applying model-based reinforcement learning (RL) methods to determine which mutations are indeed beneficial for the survivability of the virus in our another project i.e. DeepCor. To this end, firstly we develop a simple virus-host interaction model and represent it as a continuous dynamical system. The observables of this system are composed of quantities such as infection rate, reproduction rate etc. Based on these observables a suitable reward function is defined which would then be used to learn different types of mutations that tend to increase the survivability of the virus. Learning of favorable mutations is based on RL techniques, wherein the rewards obtained as a consequence of the dynamics resulting from the mutations is used to select the best set of mutations.
Project Goals
- Development of deep learning based models for the prediction of mutations and their analysis in structural and non-structural proteins as two case studies
- Development of deep learning based models for the mutation rate calculation in the genome of SARS-CoV-2
- Testing, optimization and benchmarking of the designed algorithms on the genomic datasets of SARS-CoV-2 from Gisaid
- Significance analyses of predicted mutations and mutation rates to explore effects of the mutation on the virulence of virus
- Protein stability analyses and computational development of inhibitor molecules
- Experimentation on the basis of quantitative wet lab based assays to explore the behavior of the potential inhibitor molecules
- Testing the effect of the designed drugs on SARS-CoV-2 via human cell lines
- Development of a dynamical model for virus-host interaction
- Identification and linking of mutation rates with the kinematic rates of the virus-host dynamical model
- Development of a closed-loop RL model
- Testing and verification of RL model on real data
Project architecture
Keywords
- SARS-CoV-2
- Mutation Prediction
- Mutation Rates
- Artificial Intelligence
- Drug Designing
Funding
Time span
June 2021 - May 2023
Project partners
Contact
Prof. Dr.-Ing. Naim Bajcinca
Gottlieb-Daimler-Str. 42
67663, Kaiserslautern
+49 (0)631/205-3230
naim.bajcinca(at)mv.uni-kl.de