With 128GB of ram on a Mac, GLM 4.5 Air is going to be one of your best options. You could run it anywhere from Q5 to Q8 depending on how you wanna manage your speed to quality ratio.
I have a different system that likely runs it slower than yours will, and I get 5 T/s generation which is just about the speed I read at. (Using q8)
I do hear that ollama may be having issues with that model though, so you may have to wait for an update to it.
I use llamacpp and llama-swap with openwebui, so if you want any tips on switching over I'd be happy to help. Llamacpp is usually one of the first projects to start supporting new models when they come out.
Edit: just reread your post. I was thinking it was a newer Mac lol. This may be a slow model for you, but I do think it'll be one of the best your can run.