Jiecheng Lu

About

I am a Ph.D. student in Machine Learning at Georgia Tech (ISyE), advised by Prof. Shihao Yang. My research focuses on developing sequence-model architectures that improve expressivity under practical compute constraints, spanning attention mechanisms, linear attention, dynamic-MLP views of sequence modeling, and time series foundation models.

Email Scholar GitHub OpenReview

News

May 2026
Two papers accepted at ICML 2026 ICML
Two papers (HyperMLP and StretchTime) are accepted to ICML 2026:
Jan 2026
Free Energy Mixer accepted to ICLR 2026 ICLR
Free Energy Mixer has been accepted to ICLR 2026. Links: OpenReview page · Code · Poster
Nov 2025
ZeroS accepted to NeurIPS 2025 (Spotlight) NeurIPS
ZeroS: Zero‑Sum Linear Attention for Efficient Transformers has been accepted to NeurIPS 2025 (Spotlight). Links: Neu...
May 2025
Two papers accepted at ICML 2025 ICML
Two papers (WAVE and SAMoVAR) are accepted to ICML 2025 (Poster):
Jan 2025
ICTSP accepted to ICLR 2025 ICLR
In‑context Time Series Predictor is accepted to ICLR 2025 (Poster). Links: OpenReview page · arXiv · Slides · Poster
Jul 2024
CATS accepted at ICML 2024 ICML
CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables is ...

All news

Talks

Rethinking Sequence Modeling: LLM Scaling Laws, Expressivity-Efficiency Tradeoffs, and the Role of Architecture

PhD Seminar · Georgia Tech Machine Learning Student Seminar · C1115 Druid Hills, CODA building, Atlanta, GA · Apr 2026

ML PhD seminar talk at Georgia Tech on scaling laws, expressivity-efficiency tradeoffs, and the role of architecture in sequence modeling.

Rethinking Sequence Modeling with HyperMLP: An Integrated Architectural Perspective

Invited Talk · Tsinghua University, Knowledge Engineering Group · Online · Mar 2026

Invited online talk hosted by Tsinghua University on HyperMLP and an integrated view of sequence modeling.

Rethinking Sequence Modeling: LLM Scaling Laws, Expressivity-Efficiency Tradeoffs, and the Role of Architecture

PhD Seminar · Georgia Tech ISyE PhD Student Seminar · ISyE Main Building 126, Georgia Tech, Atlanta, GA · Feb 2026

PhD seminar talk at Georgia Tech ISyE on scaling laws, expressivity-efficiency tradeoffs, and the role of architecture in sequence modeling.

All talks

Selected Publications

HyperMLP: An Integrated Perspective for Sequence Modeling

Jiecheng Lu, Shihao Yang · ICML 2026 · 2026

Presents an integrated dynamic-MLP perspective on sequence modeling, reinterpreting attention heads through context-instantiated MLP computation and learnable sequence-space mixing.

PDF arXiv Poster page

StretchTime: Adaptive Time Series Forecasting via Symplectic Attention

Yubin Kim, Viresh Pati, Jevon Twitty, Vinh Pham, Shihao Yang, Jiecheng Lu* · ICML 2026 · 2026

Introduces adaptive time series forecasting via symplectic attention, developed through mentored undergraduate research with Jiecheng Lu as corresponding author.

PDF arXiv Poster page

Free Energy Mixer

Jiecheng Lu, Shihao Yang · ICLR 2026 · 2026

Introduces Free Energy Mixer (FEM), which interprets (q,k) attention scores as a prior and performs a log-sum-exp free-energy readout to reweight values at the channel level, enabling a smooth transition from mean aggregation to selective channel-wise retrieval without increasing asymptotic complexity.

Proceedings Code Poster

ZeroS: Zero‑Sum Linear Attention for Efficient Transformers

Jiecheng Lu, Xu Han, Yan Sun, Viresh Pati, Yubin Kim, Siddhartha Somani, Shihao Yang · NeurIPS 2025 (Spotlight) · 2025

Introduces Zero‑Sum Linear Attention (ZeroS), which removes the uniform zero‑order term and reweights residuals to enable stable positive/negative attention weights, allowing contrastive operations within a single layer while retaining O(N) complexity.

PDF Proceedings Code Poster

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Jiecheng Lu, Shihao Yang · ICML 2025 (Poster), PMLR 267: 40848–40867 · 2025

Shows that a linear attention layer can be interpreted as a dynamic VAR; proposes SAMoVAR to realign multi‑layer Transformers with autoregressive forecasting for improved interpretability and accuracy.

PDF Proceedings Code Poster

WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang · ICML 2025 (Poster), PMLR 267: 40464–40490 · 2025

Adds ARMA structure to autoregressive attention via a weighted varying gate, decoupling long‑range and local effects and improving TSF quality without increasing asymptotic complexity.

PDF Proceedings Code Poster

In‑context Time Series Predictor

Jiecheng Lu, Yan Sun, Shihao Yang · ICLR 2025 (Poster) · 2025

Reformulates TSF as in‑context learning by constructing tokens of (lookback, future) task pairs, enabling Transformers to adapt predictors from context without parameter updates.

PDF Proceedings Code Poster

CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang · ICML 2024 (Poster), PMLR 235: 32990–33006 · 2024

Constructs Auxiliary Time Series (ATS) as exogenous inputs to capture inter‑series relations; identifies continuity, sparsity, and variability principles; improves multivariate TSF even with simple predictors.

PDF Proceedings Code Poster

ARM: Refining Multivariate Forecasting with Adaptive Temporal‑Contextual Learning

Jiecheng Lu, Xu Han, Shihao Yang · ICLR 2024 (Poster) · 2024

Presents ARM with AUEL, Random Dropping, and multi‑kernel local smoothing to better capture series‑wise patterns and inter‑series dependencies for long‑term multivariate TSF.

PDF Proceedings Code Poster

All publications