Abstract: Vision Transformers (ViTs) leverage the transformer architecture to effectively capture global context, demonstrating strong performance in computer vision tasks. A major challenge in ViT ...