Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Published in ICML 2025 (Poster), PMLR 267: 40848–40867, 2025

Interprets linear attention through the lens of VAR and reorganizes the stack (attention/MLP/I‑O flow) to match autoregressive objectives, yielding more interpretable and competitive multivariate TSF models.