657
The Rule (lemmy.ml)
submitted 3 months ago by roon@lemmy.ml to c/196@lemmy.blahaj.zone
you are viewing a single comment's thread
view the rest of the comments
[-] Sabata11792@ani.social 6 points 3 months ago* (last edited 3 months ago)

Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of "usable" ram set up as swap.

It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.

[-] AeonFelis@lemmy.world 5 points 3 months ago

Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.

[-] Sabata11792@ani.social 4 points 3 months ago

I would have to use an entire 1tb drive for swap but I'm sure I could manage 1 token before the heat death of the universe.

[-] AeonFelis@lemmy.world 4 points 3 months ago

I'd worry less about the heat death of the universe and more about your hardware's heat from all that load.

this post was submitted on 25 Jul 2024
657 points (100.0% liked)

196

16501 readers
2091 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago
MODERATORS