LocalLLaMA

2884 readers

6 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

SkySyrup@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs (www.tomshardware.com)

submitted 6 days ago by ThorrJo@lemmy.sdf.org to c/localllama@sh.itjust.works

10 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] mindbleach@sh.itjust.works 5 points 5 days ago (1 children)

It's trinary, and I understand why they instead say "1-bit," but it still bugs me that they call it "1-bit."

I'd love to see how low they can push this and still get spooky results. Something with ten million parameters could fit on a Macintosh Classic II - and if it ran at any speed worth calling interactive, it'd undercut a lot of loud complaints about energy use. Training takes a zillion watts. Using the model is like running a video game.

[–] milicent_bystandr@lemm.ee 1 points 3 days ago (1 children)

Can someone tell me what's meant by,

The repository describes bitnet.cpp as offering “a suite of optimized kernels that support fast and lossless inference of 1.58-bit models on CPU

Does it mean you need to run your OS with a specific kernel from bitnet.cpp? Or is it a different kind of 'kernel'?

[–] mindbleach@sh.itjust.works 2 points 3 days ago* (last edited 3 days ago)

I think they mean whatever's handling the model. A program into which you feed this inherently restricted format, so it takes advantage of those limitations, in order to run more efficiently.

Like if every number's magnitude is 1 or 0, you don't need to do floating-point multiplication.