Medical AIEvidence ExplainerJun 17, 2026, 9:11 PM· 4 min read· #3 of 3 in science

Autonomous AI Agents Can Now Manage Patient Workflows, Outperforming Doctors in Simulations

Twin studies published in Nature demonstrate that new medical AI agents can autonomously order tests, interpret results, and propose treatments within electronic health records.

By Factlen Editorial Team

Share this story

Clinical AI Developers 40%Medical Practitioners 35%Healthcare Skeptics 25%

Clinical AI Developers: Advocates for rapid integration of AI agents to solve healthcare bottlenecks.
Medical Practitioners: Focus on the potential to reduce administrative burden while emphasizing human supervision.
Healthcare Skeptics: Focus on the limitations of simulations and the risks of over-testing in real clinical environments.

What's not represented

· Patient Advocacy Groups
· Hospital Administrators & Insurers

Why this matters

If successfully deployed, autonomous AI agents could drastically reduce hospital wait times, eliminate administrative bottlenecks, and provide doctors with highly accurate diagnostic support, fundamentally changing how patients experience medical care.

Key points

Two independent studies in Nature demonstrate AI agents that can autonomously manage patient care workflows.
The MIRA agent achieved 88% diagnostic accuracy in simulated emergency cases, compared to 78% for human doctors.
Unlike previous chatbots, these agents can actively order lab tests, interpret results, and propose treatments within electronic health records.
Independent experts caution that the models were tested in simulations and ordered twice as many blood tests as human physicians.
Developers envision the AI as a clinical "co-pilot" to reduce administrative burden, not as a replacement for doctors.

88%

MIRA diagnostic accuracy

78%

Human doctor accuracy

500+

Simulated patient cases

Blood tests ordered vs humans

For years, artificial intelligence in healthcare has functioned primarily as a passive oracle—a tool that doctors could query for differential diagnoses or use to draft administrative notes. But a pair of landmark studies published today in Nature signals a profound shift in how machine learning could operate in the clinic. Researchers have successfully demonstrated autonomous medical AI "agents" that do not merely answer questions, but actively manage patient care workflows.[1][2][3]

The two systems—MIRA, developed by a team including researchers at TU Dresden, and AMIE, developed by Google—represent a leap from conversational models to action-oriented clinical co-pilots. Rather than waiting for a doctor to provide all the facts, these agents operate within simulated electronic health record (EHR) systems. They can autonomously take patient histories, order laboratory tests, interpret the incoming results, formulate diagnoses, and propose treatment plans.[1][3][5]

The evidence for MIRA’s efficacy comes from a rigorous retrospective simulation. Researchers fed the agent more than 500 real-world emergency department cases, placing the AI in a sandboxed environment where it had to navigate the diagnostic process from scratch. The results were striking: MIRA achieved a diagnostic accuracy of nearly 88%, outperforming a panel of human physicians who scored 78% on the same set of cases.[1][5][7]

MIRA's architecture allows it to mimic the iterative reasoning of a human doctor. When presented with a patient's initial symptoms, the agent does not immediately guess the disease. Instead, it issues structured commands to the EHR to request specific blood panels, microbiological cultures, or imaging studies.[1][3][5]

In a retrospective simulation of 500 emergency cases, the MIRA agent outperformed a panel of human physicians.

Only after the simulated system returns those results does MIRA synthesize the data. It then finalizes a diagnosis and drafts a guideline-compliant prescription or admission order. This ability to convert reasoning into structured clinical actions—rather than just generating a paragraph of text—is what separates an "agent" from a standard large language model.[5][7]

Simultaneously, the second Nature paper details Google's AMIE, a conversational clinical reasoning model evaluated against 21 primary care physicians across 100 multi-visit scenarios. AMIE was tested using a format similar to medical board exams, interacting with trained actors who simulated patients with specific conditions.[2][4]

The evidence showed that AMIE matched or surpassed human doctors in treatment accuracy, the appropriateness of the tests it requested, and its adherence to established clinical guidelines. By maintaining context across multiple simulated visits, the model demonstrated an ability to track disease progression and adjust its management plans accordingly.[3][4]

By maintaining context across multiple simulated visits, the model demonstrated an ability to track disease progression and adjust its management plans accordingly.

The medical community’s reaction has been a mix of intense optimism and necessary caution. Experts note that the most disruptive aspect of these models is their integration potential. By formatting their outputs as actionable EHR commands, systems like MIRA represent the closest the industry has come to a deployable clinical co-pilot that fits into existing hospital IT infrastructure.[5][7]

However, the evidence pack comes with significant caveats regarding real-world readiness. Independent evaluators emphasize that both studies rely entirely on simulations—either retrospective data from old patient records or actors following detailed scripts.[4][7]

Neither system has been tested on real patients in the chaotic, real-time environment of an actual hospital ward. Clinical practice is messy, complex, and deeply human; patients often provide contradictory histories, and hospital systems frequently suffer from missing or delayed data. It remains entirely unproven how these agents will handle the friction of a live clinical setting.[4][7]

Furthermore, the AI's superior accuracy may come with a hidden cost. In the MIRA study, the AI agent requested approximately twice as many blood tests as the human doctors did. While gathering more data naturally leads to more accurate diagnoses, this is not an entirely fair comparison to human performance.[4][7]

The AI's higher accuracy came at a cost: it ordered approximately twice as many blood tests as human doctors.

Over-testing in a real healthcare system could drive up financial costs, strain laboratory resources, and subject patients to unnecessary procedures. A true clinical co-pilot must balance diagnostic certainty with resource stewardship—a nuance that current AI agents have yet to fully master.[4][7]

The performance gap between AI and humans also varied significantly by condition. MIRA excelled at identifying complex, data-heavy conditions like appendicitis and pancreatitis, where synthesizing dozens of lab values quickly gives a machine an edge. But for common emergency room complaints like pneumonia or urinary tract infections, the AI and the human doctors performed equally poorly, with minimal difference in their success rates.[7]

Despite these limitations, the trajectory of medical AI is clear. The developers envision these agents not as replacements for human physicians, but as "autopilots" that handle the heavy lifting of data gathering and routine clinical pathways. The ultimate responsibility and final sign-off will always remain with the attending physician.[5]

Developers emphasize that AI agents will serve as assistants to handle routine data gathering, leaving final decisions to human doctors.

By automating the administrative and preliminary diagnostic steps, these systems could eventually allow doctors to focus entirely on complex decision-making and patient communication. As hospitals globally grapple with severe staffing shortages and administrative burnout, the arrival of autonomous, evidence-based AI co-pilots offers a highly promising solution to modern healthcare's most pressing bottlenecks.[3][5][6]

How we got here

Early 2023
Large language models demonstrate the ability to pass medical licensing exams, acting as passive question-answering tools.
April 2024
Early conversational AI models show promise in drafting clinical notes and summarizing patient histories.
June 2026
Nature publishes twin studies on MIRA and AMIE, marking the shift from passive AI chatbots to autonomous clinical agents.

Viewpoints in depth

Clinical AI Developers

Advocates for rapid integration of AI agents to solve healthcare bottlenecks.

Developers argue that the transition from passive chatbots to active agents is the key to unlocking AI's value in medicine. By allowing systems like MIRA to autonomously query electronic health records and order tests, hospitals can drastically reduce the administrative burden on physicians. They view the current simulated successes as proof-of-concept that AI can handle routine diagnostic pathways, freeing human doctors to focus on complex cases and patient empathy.

Independent Medical Evaluators

Experts urging caution regarding the gap between simulated success and real-world deployment.

While acknowledging the technical leap, independent evaluators stress that retrospective simulations do not capture the friction of actual clinical practice. Patients in the real world provide contradictory histories, and hospital IT systems are notoriously fragmented. Furthermore, evaluators point out that MIRA achieved its high accuracy partly by ordering twice as many blood tests as human doctors—a practice that could overwhelm laboratory resources and inflate healthcare costs if deployed without strict guardrails.

What we don't know

How these AI agents will perform when interacting with real patients who may provide confusing or contradictory medical histories.
Whether the increased rate of lab testing ordered by the AI would be financially sustainable in a real-world healthcare system.
How quickly regulatory bodies will approve autonomous agents that actively issue medical orders.

Key terms

Electronic Health Record (EHR): A digital version of a patient's paper chart, containing medical history, diagnoses, medications, and test results.
Autonomous Agent: An artificial intelligence system that can independently execute a sequence of actions—such as ordering tests or scheduling procedures—to achieve a specific goal.
Retrospective Simulation: A testing method that uses historical data from past events, like old patient records, to evaluate how a new system would have performed.

Frequently asked

What is an autonomous medical AI agent?

Unlike a standard chatbot that just answers questions, an AI agent can actively interact with hospital software to take patient histories, order lab tests, and propose treatments.

Did the AI really beat human doctors?

In a simulated environment using retrospective data, the MIRA agent achieved 88% diagnostic accuracy compared to 78% for human doctors. However, it also ordered twice as many blood tests to reach that conclusion.

Are these AI agents being used in real hospitals right now?

No. Both MIRA and AMIE have only been tested in sandboxed simulations or with actors. They require extensive real-world clinical trials before they can be deployed safely.

Sources

[1]NatureClinical AI Developers
Towards autonomous medical artificial intelligence agents
Read on Nature →
[2]NatureClinical AI Developers
Towards Conversational AI for Disease Management
Read on Nature →
[3]Financial TimesMedical Practitioners
Studies: Mira, an AI medical tool developed by researchers in Germany, and Google's Amie matched or surpassed doctors on diagnostic and treatment decisions
Read on Financial Times →
[4]Science Media CentreHealthcare Skeptics
Expert reaction to presentation of two new medical AI models for patient management (MIRA and AMIE)
Read on Science Media Centre →
[5]TU DresdenClinical AI Developers
KI-Agent MIRA unterstützt als Co-Pilot klinische Abläufe in elektronischen Patientenakten
Read on TU Dresden →
[6]The AtlanticMedical Practitioners
AI Is Taking Over Hospitals
Read on The Atlantic →
[7]El PaísHealthcare Skeptics
Dos modelos de IA muestran su utilidad para el manejo de pacientes con simulaciones y datos reales
Read on El País →

Up next

Medical AI

AI Medical Agent 'MIRA' Matches or Outperforms Doctors in Simulated ER Tests

A new autonomous AI agent successfully navigated complex emergency department workflows in a simulated environment, achieving nearly 89% diagnostic accuracy and outperforming human clinicians in guideline adherence.

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse science