Emergent Neurosymbolic Reasoning and Agentic Planning
Uncovering and exploiting the symbolic structures that arise inside neural agents to enable steerable reasoning, verifiable safety, and coordinated multi-agent workflows.
01
Why Neural Network Can Discover Symbolic Structures with
Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
-
NeuS 2025 DARPA Disruptive Idea Paper Award
This work explains, for the first time, why ordinary neural networks trained with gradient descent can learn discrete symbolic structure instead of just statistical patterns. We develop a new theoretical framework that views training as a continuous “flow” in a high-dimensional space and shows how, under geometric constraints like symmetry, the dynamics naturally separate and contract toward low-dimensional solutions that act like symbolic rules. These symbolic structures behave like algebraic building blocks that can be composed, giving a principled basis for neurosymbolic reasoning — where neural networks gain the ability to perform logical, compositional tasks. This theory also links the emergence of symbolic solutions to data scale and guides practical design of architectures that combine continuous learning with discrete reasoning.

02
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
-
COLM 2025
-
Will support NeurIPS 2025 MindGames Challenge
​SPIN-Bench stress-tests today’s chatbots on what makes strategy games hard for people: thinking many moves ahead and reading (or bluffing) other players. It bundles classic planner puzzles with board games like Chess, the cooperative card game Hanabi, and negotiation-heavy Diplomacy, then shows that even top LLMs stumble once action trees get wide or alliances shift — often scoring well below novice humans in social settings.

03
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
-
COLM 2025
SEAL breaks a language model’s chain-of-thought into execution, reflection, and transition steps, then builds a single “steering vector” in latent space that dampens the meandering reflection/transition phases and keeps the model on the direct execution path—no fine-tuning required. Plugging this vector in at inference boosts accuracy by up to 11 % on math, coding, and logic tasks while cutting the number of generated tokens — and therefore time and GPU memory — by 12–50 %.

04
UNCAP: Uncertainty-Guided Neurosymbolic Planning Using Natural Language Communication for Cooperative Autonomous Vehicles
-
AAMAS 2026
UNCAP enables cooperative autonomous vehicles to coordinate safely using concise natural-language messages instead of heavy sensor streams. Each vehicle first identifies which nearby agents actually matter, then exchanges lightweight symbolic messages that explicitly encode perception uncertainty. By selectively fusing only the most informative messages, UNCAP avoids over-communication while accounting for uncertainty during planning. Across diverse driving scenarios, this approach cuts communication bandwidth by 63%, reduces decision uncertainty by 61%, improves safety scores by 31%, and increases near-collision distance margins by 4×, making large-scale cooperative driving both safer and more scalable.

05
Know Where You're Uncertain When Planning With Multimodal Foundation Models: A Formal Framework
-
MLSys 2025
We introduce a principled way to separate and manage the two main kinds of uncertainty: perception and planning, when using multimodal foundation models for robotic planning. Instead of treating uncertainty as a single black-box score, the authors disentangle it into (i) uncertainty about what the robot sees and (ii) uncertainty about the plans it generates. They calibrate perception uncertainty using conformal prediction and quantify planning uncertainty with a formal-methods verification technique. Based on these separate measures, the system dynamically re-observes uncertain scenes and fine-tunes the model on confident data to improve reliability. In real and simulated robotic navigation tasks, this approach cuts output variability by up to 40% and improves task success rates by about 5%, demonstrating that identifying which uncertainty matters enables targeted fixes and more robust autonomous behavior.

06
Fine-Tuning Language Models Using Formal Methods Feedback: A Use Case in Autonomous Systems
-
MLSys 2024
This work turns formal verification into a drop-in training signal for language models controlling autonomous systems. The model generates multiple candidate control programs, each of which is automatically checked against safety and performance specifications. These verification scores replace human feedback and are used to directly fine-tune the model. Applied to autonomous driving tasks, this approach raises spec-satisfaction rates from roughly 60 % to over 90 %, showing that formal methods can reliably steer language models toward safer, constraint-aware decision making—without manual supervision.
