Table of Contents

Neural Networks, Tensors and other stuff

Formats

Safetensors Safe store for tensors
GGUF Georgi Gerganov Universal Format (it can mix various precisions)
PT PyTorch format

Quantization

Legacy Quantizations (Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1): These are simpler, faster methods but may have higher quantization error compared to newer types.
K-Quantizations (Q2_K, Q3_K, Q4_K, Q5_K, Q6_K): Introduced in llama.cpp PR #1684, these use super-blocks for smarter bit allocation, reducing quantization error.
I-Quantizations (IQ2_XXS, IQ3_S, etc.): State-of-the-art for low-bit widths, using lookup tables for improved accuracy but potentially slower on older hardware.


GGUF Q8_0: Very close to FP16 (perplexity 7.4933), indicating minimal accuracy loss.
GGUF Q4_K_M: Slightly higher perplexity (7.5692), still usable for most tasks.

Opensource models

Stable Diffusion 1

Stable Diffusion XL

https://stability.ai/news/stable-diffusion-sdxl-1-announcement

Stable Diffusion 3

by Stability AI https://stability.ai

Flux.1d

by Black Forest Labs https://bfl.ai

HiDream I1

by HiDream AI https://hidream.org

CLIP (Contrastive Language-Image Pre-training)

Wikipedia

Self-attention Transformer as a text encoder