ChatGPT Health performance in a structured test of triage recommendations

ChatGPT Health performance in a structured test of triage recommendations

Nature health

Key Points:

  • ChatGPT Health, launched in January 2026, was evaluated using 60 clinician-authored vignettes across 21 clinical domains, resulting in 960 triage responses under varied conditions.
  • The system showed an inverted U-shaped performance pattern, with the most dangerous triage errors occurring in non-urgent cases (35% failure) and emergency conditions (48% failure).
  • It under-triaged 52% of gold-standard emergency cases, such as diabetic ketoacidosis and impending respiratory failure, often recommending delayed evaluation instead of immediate emergency care.
  • Triage recommendations were significantly influenced by anchoring bias from family or friends minimizing symptoms, leading to less urgent care suggestions in edge cases.
  • Crisis intervention messages triggered

Trending Business

Trending Technology

Trending Health