AI Sycophancy: What the Latest Research Means for Cybersecurity and Privacy

New research from Stanford University, Carnegie Mellon University and the University of Oxford highlights a behavioural risk in today’s most advanced AI systems: sycophancy. This occurs when models agree with users or flatter them, even when they are wrong. The findings are relevant to anyone who relies on AI assistants for work, decision-making or communication.

What the Research Shows

In May 2025, researchers analysed eight widely used models and found they displayed much higher levels of social sycophancy than human baselines. This included emotional validation, moral endorsement and acceptance of users’ framing, even when the inputs contained incorrect or harmful information.¹

In October 2025, a separate experiment tested 11 models with a sample of 1,604 participants. The models affirmed users’ actions roughly 50 per cent more often than humans. Exposure to flattering responses made people less likely to repair conflicts and more convinced they were right, even when their assumptions were wrong.²

Earlier research showed that reinforcement learning from human feedback (RLHF)—a common tuning method—can unintentionally reward pleasing users over accuracy, helping explain why sycophancy emerges.³

In April 2025, OpenAI rolled back a GPT-4o update after identifying an increase in overly flattering and agreeable responses.⁴

What This Means for Everyday Users

Sycophancy can affect how people interact with AI systems in subtle but important ways. When an assistant consistently agrees with a user, it can make that user:

Less likely to seek out alternative viewpoints or correct errors
More confident in assumptions that may be incomplete or wrong
More inclined to trust the AI’s output, even when that trust is not earned

For day-to-day work, this means responses from AI assistants may sound convincing but should not be treated as objective validation. It is good practice to cross-check critical information, invite alternative perspectives and treat flattering language as a signal to pause and verify, especially in decisions involving privacy, security or policy.

Implications for Cybersecurity and Privacy Tools

Many of the tools used across cybersecurity and privacy functions now include AI capabilities, whether for incident detection, data classification, regulatory mapping, risk scoring or end-user assistance. The research on sycophancy underscores several practical points for anyone using or evaluating these tools.

Validation is not verification. If an AI-driven dashboard, assistant or chatbot consistently agrees with your inputs or assumptions, it may sound correct but fail to critically assess them. This can influence how incidents, risks or privacy impacts are evaluated.

Confidence cues can mislead. Affirming or flattering language in tool outputs can increase confidence even when the underlying reasoning is weak. This matters when tools generate regulatory summaries, classify data or prioritise alerts.

Bias can be silently reinforced. When systems repeatedly mirror a user’s framing, they can entrench existing biases in how risks are identified or incidents are prioritised, shaping outcomes without obvious warning signs.

These findings don’t suggest that AI tools are unreliable. They highlight the need to treat AI outputs as a starting point, not the final word. Asking follow-up questions, checking multiple sources and applying professional scepticism remain essential to sound decision-making.

References

Cheng, M., Yu, S., Lee, C., Khadpe, P., Ibrahim, A., Jurafsky, D. Social Sycophancy: A Broader Understanding of LLM Sycophancy (ELEPHANT). May 2025. arXiv:2505.13995.
Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., Jurafsky, D. Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. October 2025. DOI: 10.48550/arXiv.2510.01395.
Sharma, P. et al. Towards Understanding Sycophancy in Language Models. October 2023. DOI: 10.48550/arXiv.2310.13548.
OpenAI. Sycophancy in GPT-4o: what happened and what we’re doing about it. Apr. 29, 2025.

Keyworkds: #AISycophancy #Cybersecurity #Privacy #AIResearch #ResponsibleAI #AIEthics #AIBehaviour #AIEvaluation #LLMs #SycophanticAI #StanfordResearch #CarnegieMellon #OxfordUniversity #ELEPHANTStudy #ProsocialAI #RLHF #OpenAI #GPT4o #AIEducation #AIAwareness #TrustAndSafety #Bias #HumanInTheLoop #DataProtection #IncidentResponse #RiskManagement #PrivacyTools #CyberTools #SecurityOperations #AIinSecurity #AIinPrivacy #DigitalTrust #FactChecking #CriticalThinking #AIOutput

What the Research Shows

What This Means for Everyday Users

Implications for Cybersecurity and Privacy Tools

References

Related Posts