New AI Model 'DeCAF-Pearl' Accelerates Drug Discovery by Making Million-Molecule Screening Practical
Researchers have developed a new AI framework that dramatically speeds up the process of predicting how drugs bind to proteins. By using 'flow maps' instead of traditional diffusion, the model achieves a fivefold speedup, unlocking large-scale virtual screening for pharmaceutical research.
By Factlen Editorial Team
- Computational Biologists
- Focuses on the architectural leap of flow maps over diffusion models to reduce inference costs.
- Pharmaceutical Innovators
- Prioritizes the practical throughput of screening millions of molecules and generating synthetic data.
- Systems Analysts
- Views the breakthrough as part of a broader shift toward scalable, accessible AI in molecular biology.
What's not represented
- · Clinical Trial Regulators
- · Patient Advocacy Groups
Why this matters
The discovery of new life-saving medications is currently bottlenecked by the immense computing power required to simulate molecular interactions. This breakthrough slashes those computational costs, allowing researchers to screen vast libraries of potential drugs in hours rather than weeks, potentially bringing new treatments to patients years faster.
Key points
- Researchers developed DeCAF-Pearl, an AI model that predicts how drugs bind to proteins with a 5x speedup over existing tools.
- The model replaces slow, iterative diffusion processes with 'flow maps' that jump directly to the final molecular structure.
- The efficiency gains allow a standard 64-GPU cluster to virtually screen one million potential drug molecules in just 18 hours.
- By slashing computational costs, the breakthrough enables smaller biotech firms and academic labs to conduct massive virtual drug screens.
The intersection of artificial intelligence and pharmaceutical research has reached a critical new milestone with the introduction of DeCAF-Pearl, a biomolecular AI model that drastically reduces the computing power required to design new drugs. Developed by a collaborative team including researchers from Imperial College London, Carnegie Mellon University, and the Genesis Research Team, the framework addresses one of the most severe bottlenecks in modern computational biology: the immense time and hardware costs of simulating molecular interactions at the atomic level. By rethinking the underlying mathematics of how AI generates three-dimensional structures, the team has achieved a fivefold increase in processing speed without sacrificing the accuracy required for clinical research.[1][2][3]
At the heart of this breakthrough is a process known as "cofolding." In drug discovery, it is not enough to simply know the shape of a target protein; researchers must simultaneously predict the precise three-dimensional structure of the protein and how a small drug molecule will bind to it. This lock-and-key mechanism dictates whether a potential medication will effectively neutralize a disease or fail entirely. For years, the pharmaceutical industry relied on slow, physical laboratory testing to find these matches, a trial-and-error process that contributes to the decade-long timeline and billion-dollar price tag of bringing a new drug to market.[4]
Recently, the industry shifted toward virtual screening, utilizing advanced AI to simulate these interactions digitally. The current state-of-the-art tools—including AlphaFold 3, Boltz-2, and Genesis Therapeutics' own Pearl model—rely on "diffusion" architectures. Similar to the AI used to generate hyper-realistic images, these diffusion models start with a cloud of digital noise and slowly refine it into a highly accurate molecular structure. While this approach yields unprecedented fidelity, it requires the AI to take hundreds of tiny, incremental "denoising" steps for every single molecule it evaluates.[2][3]
This iterative process is computationally exhausting. When a pharmaceutical company wants to screen a library of millions of potential chemical compounds against a new cancer target, the computing costs of running full diffusion simulations become prohibitive. The sheer volume of Neural Function Evaluations (NFEs)—the number of times the AI must process data to complete a single task—creates a hard ceiling on how many drugs can be virtually tested in a reasonable timeframe. The industry needed a way to maintain the accuracy of diffusion models while bypassing the slow, step-by-step generation process.[4]

Enter the Denoiser Cofolding All-Atom Flowmap, or DeCAF. Instead of forcing the AI to walk step-by-step along the generation trajectory, the researchers built DeCAF on a mathematical framework called "flow maps." This approach teaches the model the average direction of travel between the starting noise and the final molecular structure. By understanding the broader trajectory, the flow map allows the AI to jump directly from one point to another, effectively traversing the entire generation process in just a handful of leaps rather than hundreds of microscopic steps.[1][2]
The computational savings are massive. Where a full simulation using a standard diffusion model requires 200 or more Neural Function Evaluations to accurately cofold a protein and a ligand, DeCAF-Pearl accomplishes the same task using only 40 NFEs. This 80 percent reduction in compute load translates directly to a 5x inference speedup. For laboratories and pharmaceutical companies renting time on expensive cloud supercomputers, this efficiency fundamentally changes the economics of early-stage drug discovery.[2]
This 80 percent reduction in compute load translates directly to a 5x inference speedup.
To prove the model's real-world viability, the research team benchmarked DeCAF-Pearl against a massive virtual screening task. Utilizing a cluster of 64 graphics processing units (GPUs), the model successfully screened one million distinct molecules against a protein target in approximately 18 hours. Previously, executing a screen of this magnitude with full diffusion-based accuracy would have taken nearly a week on the same hardware, or required a prohibitively expensive expansion of computing infrastructure.[1][3]

Crucially, this speed does not come at the expense of quality. When tested against a rigorous benchmark of 196 complex protein-ligand structures that the AI had never seen before, DeCAF-Pearl matched the success rate of its slower "teacher" model, Pearl. Furthermore, despite using a fraction of the computational steps, the flow-map model actually outperformed several other leading full-simulation tools, including AlphaFold 3 and Boltz-2, in generating physically valid and accurate poses.[2]
Beyond simply screening existing libraries of chemicals, the 5x speedup unlocks a second, equally vital capability: scalable synthetic data generation. Modern AI models are notoriously data-hungry, and in the realm of structural biology, high-quality examples of proteins binding to drugs are in short supply. Researchers rely on AI to generate synthetic examples to train downstream models, such as affinity predictors that estimate how tightly a drug will bind to its target.[3]
Because DeCAF-Pearl operates so efficiently, research teams can now generate an order of magnitude more synthetic training data per unit of compute. And because the model maintains strict structural accuracy, this synthetic data preserves the delicate physical signals that downstream models depend on to learn. This creates a compounding effect, where faster generation leads to better training data, which in turn produces smarter, more specialized AI tools for the pharmaceutical pipeline.[3]
Dr. Joey Bose, an Assistant Professor at Imperial College London's Department of Computing and a senior author of the study, emphasized that the field is undergoing a fundamental transition. According to Bose, the era of simply training massive foundation models to prove that AI can understand biology is ending. The new frontier is scaling inference—making these models fast and cheap enough to generate the millions of optimized samples required to actually invent new medicines.[1]

This push for efficiency mirrors broader trends across the biotechnology sector, where institutions like MIT are deploying specialized language models to optimize the genetic sequences of protein drugs, aiming to slash manufacturing costs. As AI moves from theoretical research into industrial application, the focus has shifted entirely toward practical deployment, ensuring that breakthroughs in the digital realm can survive the economic realities of physical drug development.[6]
The development of gene set foundation models and flow-map architectures signals a maturing landscape where AI is no longer a blunt instrument, but a highly tuned scientific apparatus. By integrating diverse biological contexts and streamlining the underlying mathematics, researchers are building a unified, accessible representation of human biology that can be queried at unprecedented speeds.[5]
Ultimately, the true impact of DeCAF-Pearl lies in its democratizing potential. By slashing the computational barrier to entry, the flow-map framework allows smaller biotech startups, academic institutions, and independent researchers to perform the kind of massive virtual screening that was once the exclusive domain of multinational pharmaceutical giants. As these tools become faster and more accessible, the timeline between a biological discovery and a life-saving treatment stands to shrink dramatically.[4]
How we got here
2020–2021
Early AI models like AlphaFold 2 revolutionize biology by accurately predicting single protein structures.
Late 2023
The industry shifts toward 'cofolding' models that can predict how proteins interact with other molecules and potential drugs.
Early 2024
Diffusion-based models like AlphaFold 3 and Pearl achieve state-of-the-art accuracy but face severe computational bottlenecks.
June 2026
Researchers introduce DeCAF-Pearl, utilizing flow maps to achieve a 5x speedup and making million-molecule virtual screening practical.
Viewpoints in depth
The Computational Biology View
A focus on overcoming the mathematical bottlenecks of diffusion models.
For researchers building the underlying architecture of AI models, the primary hurdle has been the iterative nature of diffusion. Models like AlphaFold 3 achieve remarkable accuracy by taking hundreds of tiny 'denoising' steps to refine a molecular structure. However, computational biologists argue this brute-force approach is unsustainable for large-scale applications. By implementing 'flow maps'—a mathematical framework that allows the model to jump directly across the generation trajectory—researchers view DeCAF-Pearl as a necessary evolution from simply proving an AI can fold proteins to making the process computationally elegant and efficient.
The Pharmaceutical Industry View
An emphasis on high-throughput screening and pipeline acceleration.
From the perspective of drug developers, the value of an AI model is measured by its throughput and its ability to identify viable drug candidates ('hits') from massive chemical libraries. Pharmaceutical innovators see the 5x speedup not just as a technical optimization, but as a pipeline revolution. Screening a million molecules in 18 hours on standard GPU clusters means that virtual screening can now replace years of expensive, trial-and-error laboratory work. Furthermore, the ability to rapidly generate synthetic data allows these companies to train even more specialized downstream models, creating a compounding advantage in drug discovery.
The Systems & Accessibility View
A focus on how lower compute costs democratize advanced biotech research.
Systems analysts and editorial observers highlight the democratizing effect of reduced inference costs. When state-of-the-art molecular modeling requires massive supercomputers, only the largest pharmaceutical companies and tech giants can participate. By slashing the required compute power by 80%, models like DeCAF-Pearl allow smaller biotech startups, university labs, and independent researchers to run world-class virtual screens. This camp argues that the true legacy of the flow-map breakthrough will be a more decentralized and highly competitive landscape for global medical research.
What we don't know
- While virtual screening is now significantly faster, it remains to be seen if this computational speedup will directly translate to higher success rates in physical human clinical trials.
- The long-term governance and open-source availability of highly capable molecular design models remain an ongoing debate within the AI safety community.
Key terms
- Cofolding
- The simultaneous computational prediction of a protein's 3D structure and the precise position of a drug molecule binding to it.
- Diffusion Model
- An AI architecture that generates data by starting with pure noise and slowly refining it through hundreds of tiny steps, commonly used in image generation and biology.
- Flow Map
- A mathematical framework that allows an AI model to skip incremental steps and jump directly across a generation trajectory, drastically reducing compute time.
- Virtual Screening
- The use of computer simulations to rapidly evaluate large libraries of chemical compounds to identify those most likely to bind to a drug target.
- Neural Function Evaluation (NFE)
- A metric used to measure the computational cost of an AI model; fewer NFEs mean the model requires less processing power to generate a result.
Frequently asked
What is molecular cofolding?
Cofolding is the process of using AI to simultaneously predict the three-dimensional shape of a protein and how a small molecule (like a drug) binds to it.
Why are diffusion models slow for drug discovery?
Diffusion models, like AlphaFold 3, generate structures by taking hundreds of tiny, incremental 'denoising' steps, which requires massive amounts of computing power and time.
How does DeCAF-Pearl achieve its speedup?
It uses a mathematical concept called 'flow maps' that allows the AI to learn the average direction of travel and jump directly to the final structure in just a few steps.
Does the faster speed reduce the model's accuracy?
No. Benchmarks show that DeCAF-Pearl matches the success rate of its slower 'teacher' model, Pearl, and outperforms several other leading tools.
Sources
[1]Imperial College LondonComputational Biologists
Researchers develop AI model that makes large-scale molecular screening practical for the first time
Read on Imperial College London →[2]arXivComputational Biologists
Few-step Cofolding with All-Atom Flow Maps
Read on arXiv →[3]Genesis ResearchPharmaceutical Innovators
Genesis Model Distillation: DeCAF-Pearl
Read on Genesis Research →[4]Factlen Editorial TeamSystems Analysts
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →[5]Cell PatternsSystems Analysts
Gene set foundation models for human cells
Read on Cell Patterns →[6]MIT NewsPharmaceutical Innovators
New AI model could cut the costs of developing protein drugs
Read on MIT News →
More in ai
See all 5 stories →On-Device AI
How Local AI Replaced the Cloud: Running Frontier Models on Your Laptop
0 sources
Enterprise AI
The Rise of Small Language Models: How Enterprises Are Running AI Locally in 2026
0 sources
Drug Discovery
New AI Model Accelerates Molecular Simulations 10,000-Fold, Slashing Drug Discovery Timelines
0 sources
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.










