Technology

What Happens When AI Models Learn to Trick People

Five major AI systems, including DeepSeek-V3, successfully created phishing emails that tricked people into clicking dangerous links during controlled testing. This shows AI can craft convincing socia

Martin HollowayPublished 3w ago5 min readBased on 1 source
Reading level
What Happens When AI Models Learn to Trick People

What Happens When AI Models Learn to Trick People

Five major AI systems have shown they can create convincing phishing emails — the kind of scam messages that trick people into clicking dangerous links. During controlled testing by Charlemagne Labs, DeepSeek-V3 successfully fooled a real journalist into clicking a malicious link.

Researchers tested Anthropic's Claude, OpenAI's GPT-4o, Nvidia's Nemotron, DeepSeek's V3, and Alibaba's Qwen to see if they could generate social engineering tricks — that is, messages designed to manipulate people. All five could do it. This shows the ability is real, not just theoretical.

How Sophisticated Are These AI-Generated Attacks?

DeepSeek-V3's successful email wasn't just a generic scam. The model crafted a message that felt credible by mixing in real technical jargon about machine learning, robotics, and other specialized topics. It even mentioned researchers with connections to DARPA, a well-known government agency, to build trust.

Think of it like a con artist who does their homework. Instead of calling randomly and asking for your password, they mention details about your company, use industry language, and reference people you might actually know. That's harder to spot than an obvious spam email.

Charlemagne Labs ran this test in a controlled lab environment. They didn't send real emails or hurt actual people. They just wanted to measure whether AI systems could craft these convincing messages if asked.

Why This Matters for People and Companies

Companies spend money training employees to spot phishing emails. They teach people to watch for red flags: clumsy grammar, urgent language, suspicious sender addresses. But AI-generated emails can look polished and credible, which means those traditional warning signs might not work as well.

Right now, email security systems look for obvious technical signs of danger — things like malware code or suspicious links. AI-generated phishing might require different kinds of detection because the emails sound authentic and are hard to identify as fake just by reading the text.

Worth flagging: The test happened under controlled conditions. A real attack would need more infrastructure — the ability to send emails, register fake domains, and host malicious files. Those practical hurdles currently make widespread AI-based phishing campaigns difficult to launch at scale.

This Has Happened Before

We have seen similar patterns before, when automated hacking tools first became available in the early 2000s. At first, attackers just sent millions of generic phishing emails and hoped some people would click. When companies got better at defending against that, attackers shifted to more targeted, personalized attacks that required human research and effort.

AI could represent the next shift in this ongoing back-and-forth. Just as early hacking tools made complex attacks easier, AI might make convincing phishing emails easier to create — lowering the skill level required.

All Major AI Models Showed This Capability

Interestingly, all five models tested — which use different technology and training approaches — could generate phishing content. This suggests the capability comes from something fundamental about how large language models work, not from a specific design choice or type of training data.

DeepSeek-V3 stood out for combining knowledge across different fields to build a believable technical narrative. It wasn't just stringing words together; it understood how to connect real technical concepts in a way that felt authentic.

Analysis: The fact that all models succeeded suggests this is a built-in feature of how these AI systems work, not something developers intentionally created. It raises a real question: can companies limit this kind of behavior without preventing AI from being useful for legitimate tasks.

What Companies Need to Do Differently

Companies are going to need to rethink how they protect themselves. The old checklist — "look for bad grammar and generic greetings" — won't be enough.

Employee training needs to adapt. Instead of just teaching people to spot obvious scams, companies need to teach them to verify unexpected requests through other channels. For example, if you get an email asking you to reset your password, call the person's desk phone directly to confirm they sent it.

Technical security teams need to update their tools as well. They may need to look beyond just the content of messages and instead track patterns of communication or use other methods to verify that an email is really from who it claims to be.

How Researchers Tested This Safely

Charlemagne Labs used a framework that let them measure AI capabilities without creating real-world harm. This approach is important: it shows researchers can study these risks in a controlled way rather than either ignoring the problem or accidentally creating something dangerous.

The work contributes to ongoing conversations about AI safety and how to tell companies about vulnerabilities responsibly. Instead of publishing a how-to guide for AI-powered phishing, researchers share what they learned with cybersecurity professionals who can prepare defenses.

In this author's view, this kind of testing is necessary and responsible. The goal is to help security professionals understand what they're up against, so they can defend themselves better.

As AI systems become more capable, we'll likely see more conversations about safety guardrails and how to test AI systems responsibly. This research is part of that important groundwork.