How AI's 'Biology Foundation Models' Are Moving Beyond Text to Cure Disease
A new wave of artificial intelligence models is demonstrating an unprecedented ability to understand genes, proteins, and small molecules simultaneously, promising to drastically accelerate drug discovery and disease diagnosis.
By Factlen Editorial Team
- Computational Biologists
- Argue that unifying disparate biological data into single foundation models is the key to unlocking new cures.
- Clinical Practitioners
- Focus on how AI tools must translate into reliable, everyday diagnostics that improve patient outcomes.
- Biosecurity Advocates
- Warn that democratizing biological design tools requires strict oversight to prevent catastrophic misuse.
What's not represented
- · Pharmaceutical Executives
- · Patient Advocacy Groups
Why this matters
Artificial intelligence is moving beyond chatbots to decode the fundamental language of human biology. By predicting how drugs and genes interact before physical lab tests are run, these new 'foundation models' promise to drastically reduce the time and cost of curing diseases.
Key points
- AI in medicine is transitioning from narrow, single-task algorithms to comprehensive 'biology foundation models.'
- The new MAMMAL model processes genes, proteins, and small molecules simultaneously to predict drug efficacy.
- Harvard's PopEVE system has set a new benchmark for identifying disease-causing genetic mutations.
- Diagnostic AI is already widely deployed in clinics to read X-rays, MRIs, and retinal scans with high accuracy.
- Experts warn that the democratization of biological AI tools requires robust governance to prevent the design of bioweapons.
The public narrative around artificial intelligence has largely been dominated by chatbots writing emails and image generators creating digital art. But away from the consumer spotlight, a far more profound revolution is unfolding in the world's laboratories. In mid-2026, the biomedical sector has officially crossed a threshold: the transition from narrow, single-task algorithms to comprehensive "biology foundation models." These systems are not just reading text; they are learning the fundamental language of life.[7]
For decades, the pharmaceutical industry has been plagued by a staggering inefficiency. Roughly 90 percent of all new drugs fail in clinical trials, often after years of development and billions of dollars in investment. This high attrition rate stems from the sheer complexity of human biology—a drug that works perfectly in a petri dish might prove toxic in a human liver, or fail to reach its intended target altogether. Modern medicine has desperately needed a way to predict these failures before a physical experiment is ever run.[4]
Enter models like MAMMAL, a newly detailed AI architecture that represents a massive leap forward from earlier breakthroughs like AlphaFold. While previous systems were highly specialized—AlphaFold, for instance, was designed specifically to predict the 3D shapes of proteins—MAMMAL is a unified foundation model. It has been trained simultaneously on billions of antibody sequences, nearly every known protein structure, millions of small molecule chemicals, and vast datasets of gene expression.[4][6]
To achieve this, researchers had to solve a fundamental translation problem. A small molecule like aspirin looks nothing like a sequence of DNA, which in turn looks nothing like a complex immune antibody. The breakthrough came by forcing all of these disparate biological domains into a single, unified format: sequences of characters. By using formats like SMILES strings—which flatten 3D chemical structures into a single line of text—the AI can process chemistry and genetics using the same underlying architecture that powers large language models.[4]

The results have been unprecedented. In fundamental classification tasks, such as labeling specific immune cell types based purely on their genetic activity, the MAMMAL model achieved a 7.5 percent improvement over the previous state-of-the-art. More importantly, it has demonstrated a remarkable ability to predict drug toxicity and efficacy, effectively narrowing the focus of laboratory validation and making the entire drug discovery pipeline faster and vastly cheaper.[3][4]
This foundation-model approach is also transforming our understanding of genetics. A research team from Harvard Medical School and the Centre for Genomic Regulation recently introduced PopEVE, a generative AI system published in Nature Genetics. PopEVE is designed to evaluate whether a specific genetic variant is benign, disease-causing, or linked to early versus later-life mortality.[1]
This foundation-model approach is also transforming our understanding of genetics.
By fusing evolutionary data spanning hundreds of thousands of species with large-scale human datasets like the UK Biobank, PopEVE has set a new benchmark in genomic analysis. In nearly all cases where a causal mutation was already known to science, the AI correctly ranked that mutation as the most damaging in the genome. This capability is particularly critical for tackling the rare disease crisis, where patients often spend years on a "diagnostic odyssey" searching for the genetic root of their illness.[1]
Beyond broad foundation models, highly targeted AI tools are uncovering the hidden mechanisms of some of humanity's most intractable diseases. At UC San Diego, bioengineers recently used AI to model the 3D structures of proteins and discovered that a gene known as PHGDH—long thought to be merely a biomarker for Alzheimer's disease—is actually one of its root causes. The AI revealed that the gene has a hidden "moonlighting" role, disrupting how brain cells switch other genes on and off, which fuels the progression of the disease.[2]
Similar breakthroughs are happening in infectious disease. Tuberculosis remains one of the deadliest infectious killers globally, infecting more than 10 million people annually, with drug-resistant strains making treatment increasingly difficult. Researchers have developed MycoBCP, an AI-powered tool that pairs bacterial cytological profiling with deep learning. The system can detect subtle changes in tuberculosis cells that escape the human eye, revealing exactly how potential new antibiotics act on the pathogen.[2]

The clinical application of these tools is already saving lives. While generative drug design represents the frontier of research, diagnostic AI is firmly established in daily practice. Deep-learning radiology tools are currently reading X-rays, CT scans, and MRIs at scale to flag fractures, strokes, and tumors, frequently matching or exceeding the sensitivity of human radiologists. FDA-cleared systems are autonomously detecting diabetic retinopathy from retinal photos and supporting breast cancer detection in mammography.[3]
The democratization of these tools is also accelerating. A team at Stanford Medicine has developed CRISPR-GPT, a large-language-model-powered tool that acts as an AI "copilot" for scientists designing gene-editing experiments. By generating experiment designs, suggesting guide RNA sequences, and forecasting potential problems, the tool streamlines what typically takes months of planning into a matter of days, significantly lowering the barrier to entry for developing potential gene therapies.[1]
However, the convergence of artificial intelligence and biology is not without profound risks. Dr. Pardis Sabeti, a computational biologist at the Broad Institute of MIT and Harvard, notes that while AI could help humanity outpace disease, it also opens the door to catastrophic misuse. The same systems that can engineer a highly effective, targeted cancer therapy could, in the wrong hands, be used to design novel biological threats or bioweapons.[5]

Navigating this "AI-shaped technological adolescence" will require unprecedented cooperation. The challenge for policymakers and the scientific community is to build governance frameworks that prevent mutual destruction while still allowing these powerful tools to democratize medical knowledge and lower overall healthcare costs.[5]
Ultimately, we are witnessing the transformation of biology from a science of observation into an engineering discipline. By unlocking the ability to read, simulate, and write the code of life with machine-learning precision, the biomedical community is moving toward a future where diseases are not just treated, but systematically mitigated or eradicated at their source.[5][7]
How we got here
2020
DeepMind's AlphaFold solves the 50-year-old grand challenge of predicting 3D protein structures.
2024
The FDA clears multiple AI systems for autonomous detection of diabetic retinopathy and breast cancer.
Late 2025
Harvard researchers publish PopEVE, setting a new benchmark for pinpointing disease-causing genetic variants.
May 2026
The MAMMAL foundation model demonstrates the ability to process genes, proteins, and small molecules simultaneously.
Viewpoints in depth
Computational Biologists
Argue that unifying disparate biological data into single foundation models is the key to unlocking new cures.
Researchers building models like MAMMAL and PopEVE believe that the historical siloing of biological data—treating chemistry, genetics, and protein structures as separate disciplines—has held back medical progress. By translating all of these domains into a unified 'language' that large AI models can process, they argue we can finally predict complex biological interactions that are invisible to the human eye, drastically reducing the time it takes to bring life-saving drugs to market.
Clinical Practitioners
Focus on how AI tools must translate into reliable, everyday diagnostics that improve patient outcomes.
For doctors and radiologists on the front lines, the excitement around generative drug design is secondary to the immediate impact of diagnostic AI. They emphasize that tools capable of reading MRIs, flagging strokes, and identifying rare genetic variants are already solving critical bottlenecks in patient care. Their primary concern is ensuring these tools are integrated seamlessly into clinical workflows without introducing new biases or diagnostic errors.
Biosecurity Advocates
Warn that democratizing biological design tools requires strict oversight to prevent catastrophic misuse.
Experts in biosecurity acknowledge the immense positive potential of biology foundation models, but they warn of a severe 'dual-use' dilemma. The exact same AI capabilities used to design a highly targeted cancer therapy could theoretically be used by bad actors to engineer novel pathogens or bioweapons. These advocates are pushing for international governance frameworks and restricted access to the most powerful models to ensure the technology is used exclusively for healing.
What we don't know
- How quickly regulatory bodies like the FDA will adapt to approve drugs designed entirely by generative AI models.
- Whether the cost savings from AI-accelerated drug discovery will be passed down to patients in the form of cheaper medications.
- How global governance frameworks will effectively police the open-source distribution of powerful biological design tools.
Key terms
- Biology Foundation Model
- An AI system trained on massive amounts of diverse biological data, capable of understanding and predicting how genes, proteins, and chemicals interact.
- SMILES string
- A method of translating complex 3D chemical structures into a single line of text so that AI models can process them like language.
- PopEVE
- An AI model developed to evaluate whether specific genetic mutations are benign or likely to cause disease.
- CRISPR-GPT
- An AI tool that acts as a copilot for scientists, helping them design and execute gene-editing experiments more efficiently.
Frequently asked
What is a biology foundation model?
It is an AI system trained on massive amounts of biological data—like DNA, proteins, and chemical structures—capable of understanding and predicting how these elements interact.
How does AI help discover new drugs?
AI can simulate how a potential drug will interact with a target protein and predict its toxicity, eliminating thousands of unviable options before expensive lab testing begins.
Is this technology currently being used in hospitals?
Yes. While generative drug design is still entering clinical trials, diagnostic AI is already widely used to read X-rays, detect cancers, and analyze genetic variants.
Sources
[1]AlationClinical Practitioners
AI Healthcare Breakthroughs 2025: 10 Innovations Reshaping Patient Care
Read on Alation →[2]UC San Diego TodayClinical Practitioners
Nine Breakthroughs Made Possible by AI
Read on UC San Diego Today →[3]Reddit ScienceClinical Practitioners
Where are the amazing AI breakthroughs in medicine and science?
Read on Reddit Science →[4]AI SearchComputational Biologists
The biggest AI breakthrough in medicine & drug discovery
Read on AI Search →[5]TIMEBiosecurity Advocates
The Science and Health Breakthroughs Shaping a New American Era
Read on TIME →[6]NatureComputational Biologists
MAMMAL: A foundation model for mammalian biology
Read on Nature →[7]Factlen Editorial TeamBiosecurity Advocates
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.










