AI Just Beat Doctors at Diagnosing ER Patients. Don't Get All Excited

Gizmodo • April 30, 2026 • business

Key Points:

Researchers at Harvard and Beth Israel Deaconess Medical Center tested OpenAI’s reasoning large language model (LLM) o1-preview in diagnosing emergency room patients, finding it achieved 67.1% accuracy, outperforming two expert physicians who scored 55.3% and 50.0%.
In diagnosing complex clinical vignettes, o1-preview included the correct diagnosis in 78.3% of cases and suggested helpful diagnoses in 97.9%, surpassing both ChatGPT-4 and human physician baselines.
Study authors emphasize AI is not a replacement for doctors but a collaborative tool that requires rigorous testing to ensure it improves patient outcomes, with clinicians maintaining oversight and accountability.
The reasoning model still struggles with multimodal inputs like medical images and audio, highlighting a key area for future research to improve diagnostic capabilities.
Experts caution about AI limitations such as hallucinations and potential manipulation, stressing the importance of AI safety and the principle of “trust, but verify” when integrating AI into clinical practice.

Trending Business