UMD Researchers Take on AI’s Hardest Problem: Teaching Machines to Reason | University of Maryland Institute for Advanced Computer Studies

A golden-orange 3D human brain encased in a glowing blue digital lattice of nodes and lines, resting on a reflective circuit-patterned surface.

At the University of Maryland, researchers aren’t just building smarter AI—they’re trying to teach machines how to reason.

Led by computer science professor Mohammad Hajiaghayi, a UMD team is working toward an ambitious goal: AI systems that can generate, test and refine mathematical ideas on their own. Backed by a $2.6 million Defense Advanced Research Projects Agency (DARPA) grant, the project aims to compress years of research into days—and potentially reshape how knowledge is produced.

UMD computer science Ph.D. students (from left) Iman Gholami, Arshia Soltani, and Danny Mittal stand in front of white boards covered with diagrams and equations. — UMD computer science Ph.D. students (from left) Iman Gholami, Arshia Soltani, and Danny Mittal are collaborating on a project to build AI systems that don’t just solve problems—but learn to reason through them.

At the center of the effort are three computer science Ph.D. students—Arshia Soltani, Iman Gholami and Danny Mittal—each tackling a different piece of what could become a new kind of machine: one capable of reasoning. Their selection, Hajiaghayi says, reflects the project’s ambition.

“I’ve had the privilege of working with some of the brightest students at UMD on this project—many medalists in math and computer Olympiad competitions, including Arshia, Danny and Iman,” Hajiaghayi says.

And the students’ first task on the DARPA-funded project is tackling a fundamental obstacle, Hajiaghayi adds, noting that before an AI system can prove anything, it must understand the problem—a step where things often break down.

“The hardest part isn’t solving the problem,” says Gholami. “It’s making sure the AI even understands what the problem is.”

A fourth-year Ph.D. student in theoretical computer science, Gholami focuses on formalization—translating human mathematical intent into a form machines can interpret. If an AI misreads a definition or constraint, the entire chain of reasoning collapses.

His work centers on structuring how AI “thinks.” By building step-by-step reasoning pipelines—using prompts, evaluation methods and feedback loops—he guides systems through problems the way a mathematician would: carefully, with constant verification.

But even a well-structured reasoning process raises a second question: when should it be trusted?

That’s where Mittal comes in.

A second-year Ph.D. student, Mittal studies the limits of AI—less how models reason, and more where that reasoning actually works.

“Humans and AI have very different skillsets,” he says. “We need to understand that boundary.”

Rather than treating AI as a universal problem-solver, Mittal maps its strengths and weaknesses. One key insight: AI systems are often better at comparing problems—deciding which is harder—than assigning absolute difficulty.

That shift toward relative judgment offers a clearer signal of where AI is likely to succeed—and where it isn’t, outlining where machines can lead and where human expertise remains essential.

Together, Gholami and Mittal define the project’s intellectual backbone: one ensures the system can reason correctly; the other determines where that reasoning applies.

If theory sets the rules, Soltani is building the system that runs them.

A first-year Ph.D. student, Soltani is developing a multi-agent architecture that functions less like a single AI model and more like a small research team.

“I implement the system,” he says.

In practice, that means coordinating multiple AI agents with distinct roles—generating candidate proofs, checking them and deciding what to try next.

The challenge is orchestration: how agents communicate, when one overrides another and what happens when they disagree.

“These details matter a lot,” Soltani says. “They can completely change the outcome.”

His work targets one of AI’s biggest weaknesses: verification. Large language models are good at producing plausible answers—but not necessarily correct ones. By distributing responsibility across agents and embedding checking mechanisms, Soltani’s system adds a layer of scrutiny that single-model approaches lack.

The result is not just automation, but interaction. Users can step in, guide the process and refine outcomes.

Despite their distinct roles, the three students work in tight coordination. Gholami’s reasoning frameworks shape Soltani’s system, while Mittal’s analysis determines which problems it should attempt. Ideas move quickly—from proposal to implementation to testing.

“Every meeting, we come up with a new idea,” Gholami says. “Then we try it.”

That rapid cycle—idea, experiment, refinement—mirrors the process of scientific discovery itself, now being encoded into a machine.

Access to large-scale computing and GPU resources at the University of Maryland Institute for Advanced Computer Studies (UMIACS), where Hajiaghayi has an appointment, allows the team to test ideas quickly—an advantage he says has accelerated several research directions.

The implications extend beyond speed.

If AI systems can reliably understand problems, reason through them and verify solutions, they could reshape the structure of research. Instead of producing one-off answers, these systems could revisit questions, refine ideas over time and contribute continuously—less like tools, more like collaborators.

The project is high-risk. Core questions about reasoning, formalization and reliability remain unresolved, and there’s no guarantee AI can meet the rigor required for true mathematical discovery.

But that uncertainty is part of the point, the researchers say.

For this team, the goal isn’t just better AI—it’s a different role for it in science. If they succeed, research may no longer be a solitary pursuit, but a collaboration between human intuition and machine reasoning.

And in that shift, the line between tool and thinker may begin to blur.

—Story by Stratis Aloimonos, UMIACS communications group