Weighted Sliding Attention: Adaptive Gaussian Decay for Context-Sensitive Local Transformers

Joshua Daniel Curry

0 evaluations Published on Apr 23, 2025

This article on Sciety

Abstract

Weighted Sliding Attention (WSA) is a lightweight attention mechanism that introduces a learnable Gaussian decay within a fixed-size window. Unlike traditional sliding window methods that apply uniform attention to all neighboring tokens, WSA learns to adjust its attention span dynamically through a single trainable parameter: sigma (σ). This allows the model to focus more narrowly in noisy contexts or broaden its view when patterns are stable and well-formed.We evaluate WSA on three synthetic benchmarks—gradient trends, symmetrical palindromes, and noisy distractor sequences—and observe that the learned σ parameter adapts meaningfully to the structure of each task. Our results demonstrate that WSA not only performs competitively but also exhibits interpretable behaviors, making it a promising alternative for resource-constrained or cognitively informed transformer models.This work explores how dynamic attention width, guided by learned trust in local context, can improve both robustness and transparency in modern attention-based architectures.

Related articles are currently not available for this article.