Research Overview - by PI
My research is driven by low-dimensional principles of intelligence, seeking the foundational blueprints that enable scalable, generalizable AI. This perspective (see a "philosophical" piece I wrote for the CPAL conference) has guided my work across many domains, aiming to move beyond brute-force scaling toward more structured and principled learning systems.
At the university level, we have recently focused on how training dynamics can inherently discover low-dimensional inductive biases, such as sparsity, low-rank manifolds, and algebraic symmetries, shaping more efficient paradigms for pre-training, supervised fine-tuning, and reinforcement learning. We further study how these structures manifest at test time as self-organized, compositional mechanisms that support inference, reasoning, and agentic planning. Finally, we are interested in how such structural invariants translate into real-world deployment, enabling resource-aware and trustworthy AI in high-stakes domains.
Historically, my work has included deep learning theory, learning-augmented optimization, sparse coding & inverse problems, and visual restoration & understanding — areas in which I remain (somehow) engaged.
​
In parallel, my industry experiences have opened several new, distinct research directions inspired by practical deployments and large-scale systems: geometry deep learning and graphs (with Amazon), video generation (with Picsart), and foundation model training for trading (with XTX Markets)
University Research



Training Dynamics with Low-Dimensional Inductive BIas.
pre-training, SFT, & RL
Selected Recent Work:
-
"Neon: Negative Extrapolation From Self-Training Improves Generation”, ICLR 2026 [Oral]
-
"The path not taken: RLVR provably learns off the principals", arXiv 2025
-
“APOLLO: SGD-like Memory, AdamW-level Performance”, MLSys 2025 (Outstanding Paper Honorable Mention)
-
"Drag-and-drop LLMs: Zero-shot prompt-to-weights", NeurIPS 2025
-
“Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning”, NeuS 2025 (DAPRA Disruptive Idea Paper Award)
-
"From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications", ICML 2025
-
“GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection”, ICML 2024 [Oral]
Test-Time Behaviors with Efficiently Self-Organized and Composite Mechanisms.
inference, reasoning, & agentic planning
Selected Recent Work:
-
"Beyond Test Time Training: Learning to Reason via Hardware-Efficient Optimal Control", ICML 2026
-
"MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games", ICML 2026
-
"∇-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space”, ICLR 2026
-
"Sherlock: Reliable and Efficient Agentic Workflow Execution", arXiv 2025
-
"SEAL: Steerable Reasoning Calibration of Large Language Models for Free", COLM 2025
-
"Flextron: Many-in-One Flexible Large Language Model", ICML 2025
-
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS”, NeurIPS 2024 [Spotlight]
-
“H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models”, NeurIPS 2023






High-Stakes Real-World Deployment.
robotics, medicine, & more
Selected Recent Work:
-
"UNCAP: Uncertainty-Guided Neurosymbolic Planning Using Natural Language Communication for Cooperative Autonomous Vehicles”, AAMAS 2026 (Best Paper Finalist)
-
“PanEcho: Complete AI-enabled echocardiography interpretation with multi-task deep learning”, JAMA 2025 (selected for AHA 2024 Late-Breaking Science Presentation)
-
“Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework”, MLSys 2025
-
“A Multimodality Video-Based AI Biomarker for Aortic Stenosis Development and Progression”, JAMA Cardiology 2024
-
“LLM-PBE: Assessing Data Privacy in Large Language Models”, VLDB 2024 (Best Paper Finalist)
-
“Severe Aortic Stenosis Detection by Deep Learning Applied to Echocardiography”, EHJ 2023 (selected for Editor’s Choice)
Past Research Trajectory
/ 2022-2024
Theoretical properties of sparse neural networks and transformers
Selected work: [ICLR 2025], [NeurIPS 2024], [JMLR 2024a], [JMLR 2024b]
/ 2016-2021
"Learning-to-optimize" algorithms with theoretical guarantees
Selected work: [JMLR 2022], [ICLR 2022], [ICLR 2019], [NeurIPS 2018]
/ 2016-2021
Image and video generative models for restoration and enhancement
Selected work:
[IEEE TIP 2021] (IEEE SPS Young Author Best Paper Award 2024)
[NeurIPS 2021] (covered by Quanta Magazine)
[ICCV 2019] (Implemented by many open-source toolboxs)
/ 2012-2019
Robust perception in visually degraded environments
Selected work: [IEEE TIP 2018], [ICCV 2017], [CVPR 2016]
/ 2010-2018
Compressive sensing, dictionary learning, and low-rank representations
Selected work: [NeurIPS 2018], [AAAI 2016]
My students often pursue broader research interests than myself, reflecting diverse perspectives within VITA group. I encourage exploring their own profiles for more details.
Industry Research

01 / XTX Markets (2024–present)
​Leading the development of large-scale foundation models for high-frequency trading data. We don't publish here :)
02 / Picsart (2022–2024)
Directing the generative AI initiative and contributing influential open-source video generation models. Selected Work:


03 / Amazon (2021–2022)
Improving cold-start performance in recommendation systems for emerging retail markets using geometric deep learning. Selected Work: