Policy Webinar 6.
Disruptive methodologies: Artificial Intelligence, Machine Learning, and AMR
Dr John Stelling, WHONet, introduced the first presenter, Dr Jon Stokes, Assistant Professor in the Department of Biochemistry and Biomedical Sciences at McMaster University, Canada, where he established the research laboratory. His work focuses on understanding the relationships between antibiotic structure, bacterial cell physiology, and the extracellular environment. The work has the potential to improve patient outcomes from the antibiotic therapy, while simultaneously decreasing the global dissemination of resistance determinants. He is lead author of ‘A deep learning approach to antibiotic discovery’.
Dr Stokes described a timeline of antibiotic discovery, highlighting the ‘Golden era’ of antibiotic discovery during the 1940s to the 1960s. After this period the approach became more difficult, due to de-replication, i.e. discovering the same antibiotics over and over again. In an attempt to overcome these difficulties, high throughput screening of synthetic chemical libraries has been utilised, but without finding new antibiotics. This has coincided with the global dissemination of resistance.
“What we need are new methods to discover novel antibiotics more rapidly and ideally less expensively than we have been, in order to outrun the global dissemination of resistance, and that’s what I hope ML [machine learning] has the capacity to help us with.”
Dr Stokes highlighted the process by which ML can significantly increase the screening of molecules compared to the conventional small molecule approach. The ML-aided approach starts by gathering a training data set to train the model. The model requires less molecules than chemical screening, and is then used to run predictions on a small scale. When finalized, the model can then run predictions on large chemical libraries, significantly higher than screening in a laboratory. It’s faster and saves costs. The predictions must then be validated in the laboratory to gauge how effective the model will be in the ‘real world’. This involved taking the hundred molecules with the highest prediction scores, and testing them in the lab – against E. coli – to see how good was the model, i.e. 51.5% inhibited the growth of E. coli. Then they looked at which of the 51 chemicals that actually inhibit the growth of E. coli: i.e. which are structurally different from known antibiotics, which is ultimately the aim.
“This is kind of our first entry into the application of ML for antibiotic discovery and we’re continuing to grow this effort, specifically in the context of narrow spectrum antibiotic discovery. So we have now models trained across different pathogens…..and we have molecules that are predicted to have activity against every combination of these four species [A. baumannii, E. coli, P. aeruginosa, S. aureus].”
Dr Stelling introduced the second presenter, Dr Brian Hie, Stanford Science Fellow at Stanford University School of Medicine. Dr Hie develops algorithms and machine learning methods, with a focus on biological application. He is the first author of ‘Learning the language of viral evolution and escape’.
Dr Hie highlighted the line of research predicting the evolution of pathogens, leveraging algorithms called neuro-language models that were originally developed in the context of natural language, to instead start to learn about the rules or the language of protein evolution. The basic question is: How predictable is evolution?
“There’s really a lot of predictability within evolution, and so this research is trying to push the boundaries of evolutionary predictability and, in particular, motivated by the problem of escape from immunity or from drugs.”
Dr Hie described research on trying to predict the rules and future evolution of viruses, and viral escape. The research aims to begin to learn the rules of how pathogens mutate themselves to evade immunity. The key initial insight was that small changes can have big semantic effects, a key idea contributing to learning the language of viral escape:
“Likewise for viral escape, where it’s been shown that even a single residue change, mutation to this viral protein sequence can induce escape from broadly neutralizing antibodies.”
Dr Hie described the work in making a model that can perform a Constrained semantic change search (CSCS). This is the idea of a computational language model: a way to assign a probability to a given sequence. Similarly, this can be done from amino acid sequences and to then train a neural network model that takes some sequence context as input and predicts the probability of some missing amino acid (or missing word) in the sequence. The models can extract the patterns of which go together and in what context. Hence to predict a viral escape, language models can be trained on large sequence databases of viral proteins – separate models for influenza, SARS-CoV-2 – to see if the language models can learn the patterns that correspond to escape. To validate predictions, laboratory experiments can be used. The language model used by Dr Hie’s team had no idea about protein biochemistry or escape: it was just trained on unlabeled amino acid sequences and it had to learn these ideas of fitness and antigenic change just directly from sequence variation alone, which were promising results. These ideas can extend outward from viruses to other proteins as well, and to drug resistance.
Another key insight was that more global complex patterns across an entire fitness landscape can be approximated by local predictions made by a language model. The language models can be scaled up so that they learn generic patterns in protein evolution across all known. Looking forward, the model was trained on the original CSCS model to predict viral escape, but the approach can also be used to also predict mutations to drug resistance, especially if those mutations are common in a population.
“Now that we have some additional ability to predict evolution and harness the power of evolution ourselves, we can do laboratory directed evolution, but guided by these large-scale language models, with an unprecedented knowledge of evolutionary biology.”
Key takeaways are that language models have the potential to improve our ability to model evolution and predict evolution; sufficient training data is important; and successful implementation of these models requires not only interdisciplinary collaboration within academia from natural language, computer science, and biologists, but also policy-makers as well.
During the Q and A session, Dr Stokes and Dr Hie highlighted the lack of training data: with viruses there’s a lot of surveillance data available with which to train the models, but this is less common for antibiotic resistance. They both further emphasized that machine learning and artificial intelligence can be accessible to anyone, including low- and middle-income countries, however, data sets must be available for training the models.