We are conditioned to look for subtle cues—a professional tone, steady confidence, and outward politeness—to decide if a stranger is trustworthy. But when that “stranger” is an AI chatbot, these deeply ingrained social instincts can backfire.
This is the mechanism behind the “charisma trap,” a psychological blind spot where the sheer competence of a machine’s delivery masks the potential unreliability of its data.
This phenomenon is at the heart of a research project led by Michelle Mazurek, an associate professor of computer science who serves as director of the Maryland Cybersecurity Center and holds an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), and Adam Aviv, an associate professor of computer science at George Washington University (GW).
Supported by a $150,000 inaugural seed grant from the Institute for Trustworthy AI in Law & Society (TRAILS)—where both researchers are members—this work is the culmination of a two-year investigation into how human intuition often fails when navigating AI.
The team’s work suggests that the way we currently engage with these models may be lulling us into a false—and potentially dangerous—sense of security.
Jan Tolsdorf, a former postdoctoral associate at GW and TRAILS and currently a postdoctoral scholar at the Max Planck Institute for Security and Privacy, who spearheaded the study’s initial phase, found that trust in an AI actually increases after a user tries—and fails—to “break” it.
In a study of 254 users who engaged in 551 open-ended conversations, many treated a session of trick questions as a personal audit; if the AI didn't glitch, the user assumed it was reliable. Crucially, the researchers found that trustworthiness perceptions increased regardless of whether the user actually detected an issue.
However, Tolsdorf explains that lay users often lack the effective strategies required to systematically probe model limitations. While professional red teaming—a structured process where experts deliberately try to provoke an AI into failing to find its weak spots—is a powerful way to reveal issues, the average person’s trial-and-error approach is less rigorous.
“Lay users seem to initially lack effective strategies for systematically probing model limitations,” Tolsdorf says. “This is likely particularly problematic when it comes to recognizing more subtle harm, such as implicit biases or social manipulation.”
The research found that users often look for “obvious” problems, like political favoritism or instructions for illegal acts. But because the AI uses “trust anchors”—such as a professional tone, fast response times, and even the mere presence of source citations—users often mistake competence for reliability. This creates a blind spot: a user might verify that a chatbot won’t provide a recipe for a bomb, yet completely overlook implicit social biases simply because they aren’t as glaring as a factual error.
“The key challenge is that users must understand that chatbots are not neutral judges, all-knowing machines, or experts," Tolsdorf says.
The study suggests that trust builds so rapidly during helpful interactions, that even when the AI makes a mistake, many participants still trust the system more after the interaction than they did before.
This creates a major hurdle for standardized safety ratings. The study suggests that for most people, fairness isn’t a data point—it’s a feeling. If an AI provides a helpful, seamless experience, a user is likely to trust their own gut over a third-party warning the system is biased.
Ultimately, these insights are pushing the researchers to rethink how AI providers can better assist users.
Joining Mazurek, Aviv and Tolsdorf in this effort was Mahmood Sharif, a senior lecturer at Tel Aviv University’s School of Computer Science and AI, who contributed his machine learning expertise. The team was also supported by student researchers Alan Luo, a fifth-year doctoral student in computer science at Maryland; Monica Kodwani, a third-year doctoral student in computer science at GW; and Junho Eum, a second-year doctoral student in computer science at GW.
The team’s results were recently accepted for publication and presentation at the ACM Conference on Fairness, Accountability, and Transparency—a premier international venue for research on the social impact of algorithms—which will be held June 25–28 in Montreal.
The researchers’ goal is no longer just to inform users that bias exists, but to encourage a more critical engagement with the technology. Future systems might need to be designed to prompt reflection—perhaps even by intentionally breaking their charismatic tone or manipulating outputs to force users to verify information.
By bridging the gap between human psychology and algorithmic behavior, the team aims to ensure that a polite response is never mistaken for an unquestionable one.
—Story by Melissa Brachfeld, UMIACS communications group