Abstract: The scale of large language models (LLMs) has steadily increased over time, leading to enhanced performance in multimodal understanding and complex reasoning, but with significant execution ...
Abstract: In recent years, extreme quantization methods-particularly one-bit quantization-have garnered significant attention in signal processing and data acquisition systems. While one-bit ...
LatticeQuant E₈ Lattice Quantization with Entropy Coding for LLM KV Cache Compression LatticeQuant is a research framework for KV cache compression in large language models, combining lattice ...
Add RDQuant submission: 11L QAT + per-layer R-D quantization 11L x 512 MLP3x with QAT (STE int6), SmearGate, BigramHash, SWA, orthogonal init, sliding window eval, per-layer R-D quantization with dead ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results