In the ‘Medium’ difficulty category, OpenAI’s o4-mini-high model scored the highest at 53.5%.
This fits my observation of such models. o4-mini-high is able to help me with 80-90% of the problems at work. For the remaining problems, it would come up with a nonsensical solution and no matter how much I prompt it, it would tunnel-vision on that specific approach. It could never second guess itself and realise that its initial solution is completely off the mark, and try an entirely differently approach. That's where I usually step in and do the work myself.
It still saves me time with the trivial stuff though.
I can't say the same for the rest of the LLMs. They are simply no good at coding and just waste my time.