Artificial intelligence has been a game-changer for biochemists like Baker. When we saw what DeepMind could do with AlphaFold, it was clear that deep learning would be a powerful tool for their work.
“It’s just that with all these problems that used to be really hard, we’re now much more successful with generative AI methods. We can do much more complex things,” says Baker.
Baker is already busy at work. He says his team is focused on designing enzymes that carry out all the chemical reactions that living things rely on to exist. His team is also working on drugs that only work at the right time and in the right place in the body.
But Baker hesitates to call this a watershed moment for artificial intelligence in science.
In AI they say: Garbage in, garbage out. If the data fed into the AI models is not good, the results will not be dazzling either.
The power of Nobel Prize-winning artificial intelligence tools in chemistry lies in the Protein Data Bank (PDB), a rare treasure trove of high-quality, edited and standardized data. This is exactly the kind of data AI needs to do anything useful. However, the current trend in AI development is to train ever-larger models on the entire Internet content, which is increasingly full of AI-generated snippets. This error is in turn sucked into the datasets and contaminates the results, leading to bias and errors. This is simply not enough for rigorous scientific discovery.
“If there were many databases as good as PNR, I would say, yes, this (award) is probably just the first of many, but it’s a unique database of its kind in biology,” Baker says. “It’s not just about methods, it’s about data. And there aren’t that many places where we have that kind of data.”