Autonomous AI Agents Match Doctors in Simulated Emergency Rooms
A new generation of medical AI can autonomously interview patients, order tests, and propose treatments within electronic health records. While the systems outperformed human doctors in diagnostic accuracy during simulations, experts caution they are not yet ready for real-world hospitals.
By Factlen Editorial Team
- Medical AI Developers
- Researchers building these systems emphasize their unprecedented ability to handle complex clinical reasoning autonomously.
- Clinical Practitioners
- Doctors and medical experts caution that simulated success does not immediately translate to the chaotic reality of a real hospital.
- Healthcare System Analysts
- System-level observers focus on the regulatory, ethical, and educational hurdles of deploying autonomous agents.
What's not represented
- · Patient Advocacy Groups
- · Hospital Administrators
- · Medical Insurance Providers
Why this matters
The transition of medical AI from passive chatbots to autonomous agents capable of ordering tests and proposing treatments could eventually solve the administrative burnout crisis in healthcare. However, understanding the limitations of these simulations is crucial before trusting algorithms with real-world patient care.
Key points
- Two new AI systems, MIRA and AMIE, demonstrate the ability to autonomously manage patient care in simulated environments.
- Operating within a sandboxed electronic health record, MIRA achieved an 88% diagnostic accuracy on historical emergency room cases.
- Human doctors scored 78% on the same cases, though the AI ordered twice as many blood tests to reach its conclusions.
- Experts emphasize these systems are not yet ready for real hospitals, as they have not been tested on live patients.
- The ultimate goal is for AI to serve as a 'clinical co-pilot,' reducing administrative burdens while humans retain final decision-making authority.
For years, artificial intelligence in medicine has functioned primarily as a passive oracle. Doctors could feed an algorithm an X-ray to highlight potential tumors, or type symptoms into a large language model to generate a list of possible diagnoses. But the AI was always a bystander, waiting for a human to pull the levers of clinical care. Today, that paradigm is shifting. A new generation of AI systems is stepping out of the advisory role and into the driver's seat, demonstrating the ability to autonomously navigate electronic health records, order tests, and formulate treatment plans.[6]
In a pair of landmark papers published this week in the journal Nature, researchers unveiled two advanced AI agents that represent a qualitative leap forward for medical technology. The systems—one developed by a German academic consortium and the other by Google DeepMind—were tested in rigorous simulations designed to mimic the high-stakes environment of a hospital.[1][2]
The most striking results come from a system called MIRA (Medical Intelligence for Reasoning and Action), developed by a team led by researchers at TU Dresden and the University Hospital Heidelberg. Unlike previous models that simply answer medical questions, MIRA is designed to act as an autonomous agent within a hospital's digital infrastructure.[1][3]
To test MIRA, the researchers created a secure, sandboxed version of an electronic health record system. They equipped the AI with a toolkit of eleven clinical instruments and gave it access to more than 85,000 possible actions. This meant MIRA could do almost everything a human doctor does at a computer terminal: request specific laboratory panels, order microbiological cultures, schedule imaging like CT scans or MRIs, prescribe medications, and recommend hospital admissions.[1][3]
The research team fed MIRA more than 500 real-world patient cases drawn from historical emergency room records. The AI was tasked with managing these cases from start to finish. It had to review the initial triage notes, decide which questions to ask to gather a complete medical history, order the appropriate diagnostic tests based on those answers, interpret the incoming results, and finally, propose a definitive diagnosis and treatment plan.[1][3]
The results were unprecedented. In the simulation, MIRA achieved a diagnostic accuracy of nearly 88%. When a panel of experienced human doctors was given the exact same cases and the same digital toolkit, their accuracy topped out at 78%. The AI not only identified the correct illnesses more frequently but also demonstrated a remarkable ability to adhere strictly to established clinical guidelines when proposing therapies.[1][3][5]

However, a closer look at the data reveals a crucial caveat to the AI's superior performance. MIRA achieved its high accuracy partly by being significantly more thorough—and resource-intensive—than its human counterparts. During the simulation, the AI ordered approximately twice as many blood tests as the human doctors did.[4][5]
Independent experts point out that having access to more diagnostic information naturally leads to higher accuracy. A human doctor in a busy emergency room must constantly balance the need for diagnostic certainty against the costs, delays, and patient discomfort associated with ordering exhaustive lab panels. MIRA, unburdened by the physical realities of a crowded waiting room, simply ordered every test that might theoretically be useful.[4][5]
Independent experts point out that having access to more diagnostic information naturally leads to higher accuracy.
"More information could itself explain higher accuracy, so this is not quite a level comparison," noted Dr. Dominic Oliver, an expert commenting on the findings through the Science Media Centre. He emphasized that while the technical achievement is profound, the AI's approach to resource utilization would need to be carefully calibrated before it could be deployed in a real healthcare system where budgets and laboratory capacities are finite.[4]

The second Nature paper introduces AMIE, a conversational AI system optimized for managing chronic diseases across multiple patient visits. While MIRA excels at the acute, structured decision-making required in an emergency room, AMIE is designed to handle the nuanced, ongoing dialogue of outpatient care. It can conduct complex clinical conversations, track a patient's symptoms over time, and adjust management plans as the disease progresses.[2][4]
Together, MIRA and AMIE represent what experts are calling the next generation of medical AI. Alfonso Valencia, Director of Life Sciences at the Barcelona Supercomputing Centre, observed that these systems move beyond simple pattern recognition. By integrating directly with electronic health records and executing clinical actions, they represent the closest step yet toward a true "clinical co-pilot."[5]
The vision for this technology is not to replace human doctors, but to fundamentally change how they spend their time. The modern physician is often overwhelmed by administrative tasks, spending hours clicking through drop-down menus and synthesizing fragmented data from different hospital departments. An autonomous agent like MIRA could take over the routine data-gathering and preliminary reasoning, presenting the human doctor with a neatly organized summary and a proposed plan of action.[3][5][6]
"Far from suggesting the replacement of healthcare professionals, the authors believe that the most promising role for these technologies will be to support doctors," Valencia noted. In this model, the AI handles the repetitive administrative and analytical heavy lifting, while the human physician retains ultimate responsibility for clinical supervision, final decision-making, and the empathetic relationship with the patient.[5]

Despite the optimism, researchers are quick to emphasize the limitations of the current studies. Both MIRA and AMIE were tested in highly controlled, retrospective simulations. They did not interact with real, unpredictable patients, nor did they have to navigate the chaotic, real-time environment of an actual hospital ward.[4][6]
A simulated emergency room does not capture the nuances of a patient who is unable to communicate clearly, the sudden deterioration of vital signs, or the complex interpersonal dynamics of a medical team. "It cannot tell us yet how this would perform in an actual hospital," Oliver cautioned. Moving from a sandboxed simulation to a live clinical environment will require overcoming massive regulatory, ethical, and safety hurdles.[4]
There is also the question of how such systems will handle edge cases. In the MIRA study, while the AI outperformed doctors on average, it struggled with certain common presentations. For instance, in cases of pneumonia or urinary tract infections—two of the most frequent reasons for emergency room visits—both the AI and the human doctors performed poorly, with minimal difference between them.[5]
Furthermore, the integration of autonomous agents raises profound questions about the future of medical training. If an AI system handles the routine diagnostic workups, how will junior doctors gain the experience necessary to develop their own clinical intuition? The medical community will need to design new training paradigms that teach physicians how to supervise and critically evaluate AI agents, rather than simply competing with them.[6]

For now, the Nature publications serve as a powerful proof of concept. They demonstrate that the fundamental building blocks of clinical reasoning—gathering history, ordering tests, and synthesizing data into a diagnosis—can be successfully encoded into an autonomous artificial intelligence.[1][2][6]
The challenge for the next decade will be translating this technical brilliance into practical, safe, and equitable healthcare tools. As these systems move from the laboratory toward the clinic, they promise to alleviate the administrative burdens crushing modern healthcare systems, allowing doctors to return their focus to where it belongs: the patient.[3][6]
How we got here
Early 2020s
Medical AI primarily functions as passive diagnostic aids, such as image recognition software for radiology.
2023-2025
Large language models demonstrate the ability to pass medical licensing exams and answer complex clinical questions.
June 2026
Nature publishes landmark studies on MIRA and AMIE, demonstrating AI agents capable of autonomous clinical reasoning and action within simulated health records.
Viewpoints in depth
Medical AI Developers
Researchers building these systems emphasize their unprecedented ability to handle complex clinical reasoning autonomously.
For the teams behind MIRA and AMIE, the breakthrough lies in moving AI from a passive advisory role to an active agent. By embedding the AI within a sandboxed electronic health record and giving it access to tens of thousands of clinical actions, developers have proven that large language models can successfully navigate the multi-step logic required to diagnose a patient. They view the high accuracy rates as evidence that AI can soon handle the administrative and routine analytical burdens that currently overwhelm hospital staff, ultimately serving as a tireless clinical co-pilot.
Clinical Practitioners
Doctors and medical experts caution that simulated success does not immediately translate to the chaotic reality of a real hospital.
While impressed by the technical achievements, practicing physicians point out significant caveats. The AI's higher accuracy in the MIRA study was achieved partly by ordering twice as many blood tests as human doctors—a strategy that is impractical in real-world healthcare systems with finite budgets and laboratory capacities. Furthermore, clinicians emphasize that a retrospective simulation using historical data cannot replicate the unpredictable nature of live patients, complex interpersonal dynamics, or the sudden emergencies that define actual hospital wards. Human supervision remains non-negotiable.
Healthcare System Analysts
System-level observers focus on the regulatory, ethical, and educational hurdles of deploying autonomous agents.
Analysts looking at the broader healthcare landscape stress that integrating autonomous AI will require a fundamental redesign of medical infrastructure. Before systems like MIRA can be deployed, hospitals must establish rigorous safety guardrails and clear lines of legal liability for AI-generated treatment plans. Additionally, analysts raise concerns about the future of medical education: if AI takes over routine diagnostic workups, teaching hospitals will need to develop new methods for junior doctors to build their clinical intuition and learn how to effectively supervise their algorithmic counterparts.
What we don't know
- How these autonomous agents will perform when interacting with live, unpredictable patients in a real-time hospital environment.
- Whether the increased rate of test ordering by the AI would overwhelm hospital laboratories and budgets in practice.
- How legal liability will be structured if an autonomous AI agent recommends an incorrect treatment plan that harms a patient.
Key terms
- Autonomous Agent
- An artificial intelligence system that can independently execute a sequence of actions—such as navigating software or ordering tests—to achieve a specific goal, rather than just answering single prompts.
- Electronic Health Record (EHR)
- A digital version of a patient's paper chart, containing their medical history, diagnoses, medications, treatment plans, immunization dates, and test results.
- Retrospective Simulation
- A study design that uses historical data (like past patient records) to test how a new system or intervention would have performed, rather than testing it on live, current patients.
- Clinical Co-pilot
- A proposed model for medical AI where the system assists a human doctor by handling data gathering and preliminary analysis, while the human retains ultimate decision-making authority.
Frequently asked
What is MIRA?
MIRA (Medical Intelligence for Reasoning and Action) is an autonomous AI agent developed to operate within electronic health records, capable of taking patient histories, ordering tests, and proposing diagnoses.
Did the AI really beat human doctors?
In a retrospective simulation using historical emergency room cases, MIRA achieved an 88% diagnostic accuracy compared to 78% for a panel of human doctors, though it ordered twice as many blood tests to do so.
Is this AI being used in real hospitals right now?
No. The current studies were conducted in highly controlled, sandboxed simulations using historical data. The AI did not interact with live patients or operate in a real-time clinical setting.
What is the difference between MIRA and AMIE?
While MIRA is designed for acute, structured decision-making in an emergency room setting, AMIE is a conversational AI optimized for managing chronic diseases and conducting complex clinical dialogues across multiple patient visits.
Will AI replace human doctors?
Experts and developers agree that the goal is not replacement, but support. The AI is envisioned as a 'clinical co-pilot' that handles routine administrative and analytical tasks, leaving final decisions and patient relationships to human physicians.
Sources
[1]NatureMedical AI Developers
Towards autonomous medical artificial intelligence agents
Read on Nature →[2]NatureMedical AI Developers
Towards Conversational AI for Disease Management
Read on Nature →[3]TU DresdenMedical AI Developers
MIRA: Autonomous medical AI agents in clinical practice
Read on TU Dresden →[4]Science Media Centre UKClinical Practitioners
Expert reaction to two studies on medical AI agents
Read on Science Media Centre UK →[5]Science Media Centre SpainClinical Practitioners
Dos modelos de IA muestran su utilidad para el manejo de pacientes
Read on Science Media Centre Spain →[6]Factlen Editorial TeamHealthcare System Analysts
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get science stories with full source coverage and perspective breakdowns delivered to your inbox.








