Jiecheng Lu
About
I am a Ph.D. student in Machine Learning at Georgia Tech (ISyE), advised by Prof. Shihao Yang. My research focuses on developing sequence-model architectures that improve expressivity under practical compute constraints, spanning attention mechanisms, linear attention, dynamic-MLP views of sequence modeling, and time series foundation models.
News
- May 2026
- Jan 2026
- Nov 2025
- May 2025
- Jan 2025
- Jul 2024
Talks
ML PhD seminar talk at Georgia Tech on scaling laws, expressivity-efficiency tradeoffs, and the role of architecture in sequence modeling.
Invited online talk hosted by Tsinghua University on HyperMLP and an integrated view of sequence modeling.
PhD seminar talk at Georgia Tech ISyE on scaling laws, expressivity-efficiency tradeoffs, and the role of architecture in sequence modeling.
Selected Publications
Presents an integrated dynamic-MLP perspective on sequence modeling, reinterpreting attention heads through context-instantiated MLP computation and learnable sequence-space mixing.
Introduces adaptive time series forecasting via symplectic attention, developed through mentored undergraduate research with Jiecheng Lu as corresponding author.
Introduces Free Energy Mixer (FEM), which interprets (q,k) attention scores as a prior and performs a log-sum-exp free-energy readout to reweight values at the channel level, enabling a smooth transition from mean aggregation to selective channel-wise retrieval without increasing asymptotic complexity.
Introduces Zero‑Sum Linear Attention (ZeroS), which removes the uniform zero‑order term and reweights residuals to enable stable positive/negative attention weights, allowing contrastive operations within a single layer while retaining O(N) complexity.
Shows that a linear attention layer can be interpreted as a dynamic VAR; proposes SAMoVAR to realign multi‑layer Transformers with autoregressive forecasting for improved interpretability and accuracy.
Adds ARMA structure to autoregressive attention via a weighted varying gate, decoupling long‑range and local effects and improving TSF quality without increasing asymptotic complexity.
Reformulates TSF as in‑context learning by constructing tokens of (lookback, future) task pairs, enabling Transformers to adapt predictors from context without parameter updates.
Constructs Auxiliary Time Series (ATS) as exogenous inputs to capture inter‑series relations; identifies continuity, sparsity, and variability principles; improves multivariate TSF even with simple predictors.
Presents ARM with AUEL, Random Dropping, and multi‑kernel local smoothing to better capture series‑wise patterns and inter‑series dependencies for long‑term multivariate TSF.
