Testing suggests Google's AI Overviews tells millions of lies per hour

Testing suggests Google's AI Overviews tells millions of lies per hour

Ars Technica business

Key Points:

  • Google’s AI Overviews, powered by Gemini, answers correctly about 90% of the time according to a New York Times analysis, but this still results in millions of incorrect answers daily across all Google searches.
  • The accuracy test used, SimpleQA, involves over 4,000 verifiable questions and showed improvement from 85% accuracy with Gemini 2.5 to 91% with Gemini 3, though some answers remain confidently wrong.
  • Examples of errors include incorrect or contradictory information about Bob Marley’s museum date and Yo Yo Ma’s induction into the Classical Music Hall of Fame, highlighting challenges in AI factuality.
  • Google disputes the SimpleQA test’s validity, claiming it contains inaccuracies and does not reflect typical user queries, preferring their own vetted SimpleQA Verified benchmark instead.
  • The evaluation of AI models is inherently complex due to non-deterministic outputs and the use of different Gemini model versions for speed and cost reasons, leading to variable answer quality in AI Overviews.

Trending Business

Trending Technology

Trending Health