In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors

TechCrunch • May 3, 2026 • business

Key Points:

A new study published in Science by Harvard Medical School and Beth Israel Deaconess Medical Center found that OpenAI’s large language models (o1 and 4o) performed as well as or better than internal medicine attending physicians in diagnosing emergency room patients based on electronic medical records.
In a test of 76 ER patients, the o1 model provided exact or very close diagnoses in 67% of triage cases, outperforming two physicians who achieved 55% and 50% accuracy respectively, particularly excelling at the initial triage stage with limited patient information.
The study emphasized that AI models were tested without data pre-processing and only with text-based inputs, noting current limitations in reasoning over non-text data and cautioning that AI is not yet ready for real-world life-or-death medical decisions.
Researchers called for prospective trials to evaluate AI in clinical settings and highlighted the lack of formal accountability frameworks for AI diagnoses, with experts stressing the importance of human oversight in critical medical decisions.
Some medical professionals criticized media coverage of the study as overhyped, noting the comparison was made against internal medicine physicians rather than emergency medicine specialists, and underscored that ER doctors prioritize identifying life-threatening conditions over pinpointing exact diagnoses on first assessment.

Trending Business