TechTakes

1799 readers

73 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

LLMs average <5% on 2025 Math Olympiad; award each other 20x points (arxiv.org)

submitted 2 weeks ago by slop_as_a_service@awful.systems to c/techtakes@awful.systems

44 comments fedilink hide all child comments

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments

[–] swlabr@awful.systems 34 points 2 weeks ago* (last edited 2 weeks ago) (7 children)

“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

LLMs achieve reasoning level of average rationalist

[–] V0ldek@awful.systems 17 points 2 weeks ago (3 children)

This is actually an accurate representation of most "gifted olympiad laureate attempting to solve a freshman CS problem on the blackboard" students I've went to uni with.

Jumps to the front after 5 seconds from the task being assigned, bluffs that the problem is trivial, tries to salvage their reasoning for 5 minutes when questioned by the tutor, turns out the theorem they said was trivial is actually false, sits down having wasted 10 minutes of everyone's time.

[–] swlabr@awful.systems 7 points 2 weeks ago (1 children)

This needed a TW jfc (jk, uh, sorta)

[–] V0ldek@awful.systems 7 points 2 weeks ago

TW: contains real chuds

load more comments (1 replies)

load more comments (4 replies)