GPT-4o and Co. get it wrong more often than right, says OpenAI study. (the-decoder.com)

submitted 2 weeks ago by Dot@feddit.org to c/fuck_ai@lemmy.world

4 comments fedilink hide all child comments

A new OpenAI study using their SimpleQA benchmark shows that even the most advanced AI language models fail more often than they succeed when answering factual questions, with OpenAI's best model achieving only a 42.7% success rate.

The SimpleQA test contains 4,326 questions across science, politics, and art, with each question designed to have one clear correct answer. Anthropic's Claude models performed worse than OpenAI's, but smaller Claude models more often declined to answer when uncertain (which is good!).

The study also shows that AI models significantly overestimate their capabilities, consistently giving inflated confidence scores. OpenAI has made SimpleQA publicly available to support the development of more reliable language models.

you are viewing a single comment's thread
view the rest of the comments

[-] kboy101222@sh.itjust.works 10 points 2 weeks ago

Anthropic's Claude models performed worse than OpenAI's, but smaller Claude models more often declined to answer when uncertain (which is good!).

It's right there, bud

this post was submitted on 30 Oct 2024

20 points (91.7% liked)

Fuck AI

1346 readers

164 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 8 months ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

themaninblack@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world