Free Energy Mixer
Published in ICLR 2026, 2026
Free Energy Mixer (FEM) replaces head-level convex combination with a free-energy-based readout that combines prior attention and value evidence, endowing attention with channel-wise selectivity. FEM is plug-and-play for standard and linear attention, RNNs, and SSMs, improving expressiveness and performance while preserving computational efficiency.
