Local AI kind of sucks right now, and everything is massively over branded as AI ready these days.
There aren’t a lot of compelling local use cases and the memory constraints of local mean you end up with fairly weak models.
You need a high end high memory local setup to get decent token rates, and I’m finding right now 30-70b models are the minimum viable size.
That doesn’t compare with speed of online models running on GPUs that cost more than luxury cars.