User Tools

Site Tools


nn:index

Formats

Safetensors Safe store for tensors
GGUF Georgi Gerganov Universal Format (it can mix various precisions)
PT PyTorch format

Legacy Quantizations (Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1): These are simpler, faster methods but may have higher quantization error compared to newer types.
K-Quantizations (Q2_K, Q3_K, Q4_K, Q5_K, Q6_K): Introduced in llama.cpp PR #1684, these use super-blocks for smarter bit allocation, reducing quantization error.
I-Quantizations (IQ2_XXS, IQ3_S, etc.): State-of-the-art for low-bit widths, using lookup tables for improved accuracy but potentially slower on older hardware.


GGUF Q8_0: Very close to FP16 (perplexity 7.4933), indicating minimal accuracy loss.
GGUF Q4_K_M: Slightly higher perplexity (7.5692), still usable for most tasks.

Opensource models

Stable Diffusion 1

Stable Diffusion XL

Stable Diffusion 3

by Stability AI https://stability.ai

Flux.1d

by Black Forest Labs https://bfl.ai

HiDream I1

CLIP (Contrastive Language-Image Pre-training)

Wikipedia

Self-attention Transformer as a text encoder

nn/index.txt · Last modified: 2025/08/05 22:27 by Jan Forman