Sampling Quantization and Encoding

Oiso: Outlier-Isolated Data Format for Low-Bit Large Language Model Quantization

Abstract: The scale of large language models (LLMs) has steadily increased over time, leading to enhanced performance in multimodal understanding and complex reasoning, but with significant execution ...

IEEE

Two-Bit Quantization and Its Priority Under Low Sampling Rate

Abstract: In recent years, extreme quantization methods-particularly one-bit quantization-have garnered significant attention in signal processing and data acquisition systems. While one-bit ...

GitHub

E₈ Lattice Quantization with Entropy Coding for LLM KV Cache Compression

LatticeQuant E₈ Lattice Quantization with Entropy Coding for LLM KV Cache Compression LatticeQuant is a research framework for KV cache compression in large language models, combining lattice ...

GitHub

Non-Record: DG Attention, Differential-Gated Attention with Depth-Scheduled Novelty Encoding: (val_bpb=1.1898) #542

Add RDQuant submission: 11L QAT + per-layer R-D quantization 11L x 512 MLP3x with QAT (STE int6), SmearGate, BigramHash, SWA, orthogonal init, sliding window eval, per-layer R-D quantization with dead ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results