LLaMA Now Goes Faster on CPUs (justine.lol)

submitted 7 months ago by agilob@programming.dev to c/performance@programming.dev

0 comments fedilink hide all child comments

My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here

this post was submitted on 01 Apr 2024

17 points (90.5% liked)

Performance

289 readers

2 users here now

A community for posts relating to performance

Wormhole

!programming@programming.dev

founded 1 year ago

MODERATORS

Ategon@programming.dev

agilob@programming.dev