ChatGPT Health performance in a structured test of triage recommendations

Nature • February 24, 2026 • health

Key Points:

ChatGPT Health, launched in January 2026, was evaluated using 60 clinician-authored vignettes across 21 clinical domains, resulting in 960 triage responses under varied conditions.
The system showed an inverted U-shaped performance pattern, with the most dangerous triage errors occurring in non-urgent cases (35% failure) and emergency conditions (48% failure).
It under-triaged 52% of gold-standard emergency cases, such as diabetic ketoacidosis and impending respiratory failure, often recommending delayed evaluation instead of immediate emergency care.
Triage recommendations were significantly influenced by anchoring bias from family or friends minimizing symptoms, leading to less urgent care suggestions in edge cases.
Crisis intervention messages triggered