As far as predicting the drug potency against Covid-19 is concerned, we’ll be using a Gated Graph Sequence Neural Network (originally introduced here https://arxiv.org/abs/1511.05493) trained on the data published here, predicting MM-GBSA based binding free energy from chemical structure.
Regarding the cost of synthesis we will be using the synthetic accessibility score as proposed in ( https://www.ncbi.nlm.nih.gov/pubmed/20298526 ) available in http://rdkit.org/ .
For our optimizer, we selected Evolutionary Algorithms (EAs) due to their well known ability to solve multi-objective optimization problems and the fact that they can do so without requiring gradient information (hence they can handle non-differentiable design spaces).
1. AI for CFD: Intro (part 1)
2. Using Artificial Intelligence to detect COVID-19
3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code
4. Machine Learning System Design
EAs utilize the main principles of Darwinian evolution, evolving better and better molecules as generations proceed. They do so by selecting parents based on their environmental fitness (demonstrated good behavior regarding the selected objectives) and breeding new individuals via crossover and mutation (see figure 2).
The result of every multi-objective optimization is in the form of a Pareto front denoting all the best (non dominated) compromises between the objectives. Below is the computed Pareto Front approximation of the Covid-19 drug design problem (after 40 generations of evolution). For comparison the potency of lopinavir (a drug currently undergoing clinical trial for Covid-19) is noted with the dotted line.
We’ve seen that reframing drug design from a simple (but expensive) screening exercise to a multi-objective optimization problem could be beneficial.
We’ve also seen that methods borrowed from Machine Learning and numerical optimization could be used when undergoing such a task.
The purpose of this post was to present a different way of thinking about the drug design problem and NOT to design a new compound.
The results of the optimization (Pareto front) heavily relay on the quality of the predictive models utilized. Models who’s accuracy I have no way of validating! In fact I would suspect that accurate predictive modeling would involve chemical/physics/quantum simulations which would be vastly more computationally demanding than the simple GGNN used here.
To handle that extra computational cost, a very interesting next step would be to borrow a method typically used in aerospace engineering, the notion of Distributed Hierarchical Optimization. More about that on a following post.