KV Cache Calculator

Add Custom Model

KV Cache Bytes per Token

KV Cache Size vs Sequence Length

Max Requests per GPU

Model Details

Model Type Layers KV Heads Head Dim BF16 B/tok FP8 B/tok 128K BF16 128K FP8
MLA Multi-head Latent Attention | MHA Multi-Head Attention | SWA/Hybrid-SWA Sliding Window