Safetensors | Safe store for tensors |
GGUF | Georgi Gerganov Universal Format (it can mix various precisions) |
PT | PyTorch format |
Legacy Quantizations (Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1): These are simpler, faster methods but may have higher quantization error compared to newer types.
K-Quantizations (Q2_K, Q3_K, Q4_K, Q5_K, Q6_K): Introduced in llama.cpp PR #1684, these use super-blocks for smarter bit allocation, reducing quantization error.
I-Quantizations (IQ2_XXS, IQ3_S, etc.): State-of-the-art for low-bit widths, using lookup tables for improved accuracy but potentially slower on older hardware.
GGUF Q8_0: Very close to FP16 (perplexity 7.4933), indicating minimal accuracy loss.
GGUF Q4_K_M: Slightly higher perplexity (7.5692), still usable for most tasks.
by Stability AI https://stability.ai
by Black Forest Labs https://bfl.ai
Self-attention Transformer as a text encoder