New technique to run 70B LLM Inference on a single 4GB GPU (ai.gopubby.com)

submitted 11 months ago by tinwhiskers@lemmy.world to c/fosai@lemmy.world

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] tinwhiskers@lemmy.world 1 points 11 months ago

Yeah, I'm not sure how they get that, but maybe, if you're wanting to run a model in-house, as many people would prefer, you can then run much more capable models on consumer grade hardware and make savings there compared to requiring the more expensive kit. Many would already have decent hardware, and this extends what they can run before needing to fork out for new hardware.

I know, I'm guessing.