this post was submitted on 05 Aug 2025
27 points (100.0% liked)
Free Open-Source Artificial Intelligence
3816 readers
3 users here now
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
VRAM vs RAM:
VRAM (Video RAM): Dedicated memory on your graphics card/GPU - Used specifically for graphics processing and AI model computations - Much faster for GPU operations - Critical for running LLMs locally
RAM (System Memory): Main system memory used by CPU and general operations - Slower access for GPU computations - Can be used as fallback but with performance penalty
So - For basic 7B parameter LLMs locally, you typically need:
Minimum: 8-12 GB VRAM - Can run basic inference/tasks - May require quantization (4-bit/8-bit)
Recommended: 16+ GB VRAM - Smoother performance - Handle larger context windows - Run without heavy quantization
Quantization means reducing the precision of the model's weights and calculations to use less memory. For example, instead of storing numbers with full 32-bit precision, they're compressed to 4-bit or 8-bit representations. This significantly reduces VRAM requirements but can slightly reduce model quality and accuracy.
Options if you have less VRAM: CPU-only inference (very slow) - Model offloading to system RAM - Use smaller models (3B, 4B parameters)