As we celebrate the first anniversary of influential AI products like ChatGPT, Midjourney, and Github Copilot, we pause to reflect on their transformative effect on numerous domains. Notably, those whose output can be digitised. Fields from copywriting to software engineering have been dramatically reshaped. This article delves into machine learning's impact on drug discovery and elucidates how Generative AI can guide the future of therapeutics.
The Nexus of Computational Biology and Drug Discovery
The essence of computational biology lies in decoding fundamental codes of life encapsulated within nucleic and amino acid sequences. Machine learning advancements, riding on waves of digitization, have granted us increased comprehension of these biological structures and processes. The logical next step is to employ this knowledge to generate novel structures following same latent motifs of the model's input.
Traditional drug discovery relied on labour-intensive experimentation to explore chemical properties of novel compounds. However, this approach has gradually given way to computational analyses employing increasingly sophisticated techniques at similarly larger and larger scale. Enter Generative AI, the latest advancement in this evolution. It allows us to input known therapeutics and their properties into an AI model, and use this model to identify novel therapeutic compounds.
Sophistication of model and depth of data can vary outcomes significantly: from discovering a refined version of an existing drug with fewer side effects, to entirely novel therapies boasting higher success rates and more desirable attributes.
Generative AI explained
Pioneering generative AI techniques emerged around 2013 and 2014, building on foundations laid by Markov models in the 1950s. Two primary methodologies enabling Generative AI are Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Each adopts a distinct approach to achieve a similar objective.
The art of forgery: GANs
GANs embody application of game theory to machine learning. The original framework conceptualises the Generator as a forger in an adversarial relationship with the Discriminator. Competition between these two models sharpens their capabilities until the Generator masterfully produces data which the Discriminator cannot differentiate from the real. This framework's efficacy is gauged using a minimax game strategy, aiming to minimise worst-case potential loss. This strategy is also employed in chess algorithms.
Same but different: VAEs
VAEs, on the other hand, build upon the principle of autoencoders. Autoencoders are simple neural networks designed to represent data in a reduced space. The process involves mapping from original space to a lower-dimensional latent space, and iterating this process until an accurate representation is achieved.
VAEs deviate from vanilla AEs by regularising latent space, preventing overfitting to given data. As a result, VAEs can create variations of input while preserving most of its key attributes.
Molecular Structures: The Final Frontier
These techniques can inform us about mechanisms making certain molecules effective (and ineffective?) against specific diseases, and subsequently generate new molecules with comparable or superior efficacy. With numerous companies and academic institutions actively researching this field, the future of drug discovery gleams with promise.
Success stories
A few companies are already seeing significant time savings from the technology. For example, in January, researchers explained how they used AI-powered protein folding prediction model AlphaFold to discover a novel CDK20 small molecule inhibitor in 30 days, publishing results in Chemical Science.
Similarly, biotech Evotec announced phase 1 clinical trial for a novel anticancer compound it developed together with pharmatech Exscientia, that uses AI for small-molecule drug discovery. By using Exscientia’s ‘Centaur Chemist’ AI design platform, the companies identified their drug candidate in 8 months. For reference, the traditional discovery process often takes between 4 and 5 years, as Nature reports.
Another example is biotech Insilico Medicine, which announced in April that it had discovered a potent, selective and orally-available small molecule inhibitor of CDK8 for cancer treatment using a structure-based generative chemistry approach. The company used the Chemistry42 multi-modal generative reinforcement learning platform in their research, which was published in the American Chemical Society’s Journal of Medicinal Chemistry.
At Omniscope, we have been watching the advent of generative AI with interest and have a number of ideas on how to apply it to our specific research. Watch this space!