Breakthrough A.I. Makes Huge Leap Toward Solving 50-Year-Old Problem in Biology
Proteins are vital biological molecules, and it can require years of lab-based experiments to tease out the 3-D shape of just one
Life on Earth relies on microscopic machines called proteins that are vital to everything from holding up the structure of each cell, to reading genetic code, to carrying oxygen through the bloodstream. With meticulous lab work, scientists have figured out the precise, 3-D shapes of about 170,000 proteins—but there are at least 200 million more to go, Robert F. Service reports for Science magazine.
Researchers have been trying to find efficient ways to estimate the shape of proteins since at least the 1970s, reports Will Douglas Heaven for MIT Tech Review. Now, the artificial intelligence company DeepMind, which is owned by the same company that owns Google, has developed a tool that can predict the 3-D shapes of most proteins with similar results to experiments in the lab, Cade Metz reports for the New York Times. While lab experiments can take years to tease out a protein structure, DeepMind’s tool, called AlphaFold, can come up with a structure in just a few days, per Nature’s Ewen Callaway. The tool could help speed up studies in medicine development and bioengineering.
Molecular biologists want to know the structures of proteins because the shape of a molecule determines what it’s able to do. For instance, if a protein is causing damage in the body, then scientists could study its structure and then find another protein that fits it like a puzzle piece to neutralize it. AlphaFold could accelerate that process.
“This is going to empower a new generation of molecular biologists to ask more advanced questions,” says Max Planck Institute evolutionary biologist Andrei Lupas to Nature. “It’s going to require more thinking and less pipetting.”
DeepMind tested out AlphaFold by entering it in a biennial challenge called Critical Assessment of Structure Prediction, or CASP, for which Lupas was a judge. CASP provides a framework for developers to test their protein-prediction software. It’s been running since 1994, but the recent rise of machine learning in protein structure prediction has pushed participants to new levels. AlphaFold first participated last year and scored about 15 percent better than the other entries, per Science magazine. This year, a new computational strategy helped AlphaFold leave the competition in the dust.
Proteins are made of chains of chemicals called amino acids that are folded up into shapes, like wire sculptures. There are 20 kinds of amino acids, each with their own chemical characteristics that affect how they interact with others along the strand. Those interactions determine how the strand folds up into a 3-D shape. And because these chains can have dozens or hundreds of amino acids, predicting how a strand will fold based just on a list of amino acids is a challenge.
But that’s exactly what CASP asks participants to do. CASP assessors like Lupas have access to the answer key—the 3-D structure of a protein that was determined in a lab, but not yet published publicly. AlphaFold’s entries were anonymized as “group 427,” but after they solved structure after structure, Lupas was able to guess that it was theirs, he tells Nature.
“Most atoms are within an atom diameter of where they are in the experimental structure,” says CASP co-founder John Moult to the New York Times. “And with those that aren’t, there are other possible explanations of the differences.”
AlphaFold’s results were so good that the organizers posed an extra challenge to make sure that there was nothing fishy going on. Lupas’ lab had been studying a protein for over a decade but hadn’t been able to interpret their results because its particular structure made it difficult to study with standard lab techniques. CASP gave the amino acid sequence of that protein to DeepMind, and AlphaFold came up with a predicted structure. With that in hand, Lupas was able to interpret his lab’s results in about 30 minutes.
“It’s almost perfect,” Lupas tells Science magazine. “They could not possibly have cheated on this. I don’t know how they do it.”
AlphaFold isn’t perfect, and there’s still work to be done in the field of predicting how proteins will fold. Repetitive sequences threw off the program, for example. And many proteins work in groups called protein complexes, and that super-structure prediction hasn’t yet been solved with computers.
“This isn’t the end of something,” says Janet Thornton, the European Bioinformatics Institute’s director emeritus, to Science magazine. “It’s the beginning of many new things.”
CASP requires participants to share enough information about their methods for other scientists to recreate their experiments, reports Science. Experts tell the Guardian’s Ian Sample that they hope to use AlphaFold and similar technologies to make progress on designer medicines, bioengineered crops, and new ways to break down plastic pollution. DeepMind tells the Guardian that it has partnered with groups studying malaria, sleeping sickness and leishmaniasis.
“I think it’s fair to say this will be very disruptive to the protein-structure-prediction field,” says Columbia University computational biologist Mohammed AlQuraishi to Nature. “…It’s a breakthrough of the first order, certainly one of the most significant scientific results of my lifetime.”