The Text Your AI Reads Can Hijack It

To a language model, every piece of text it reads is potentially an instruction. The moment you point an AI agent at your notes, web clips, and files, a hidden command buried in a pasted article, an email, or a shared document can quietly redirect it. This is prompt injection, and it has reached your notes.

The threat is not exotic. OWASP, the security standards body, now ranks prompt injection as LLM01:2025 — the first and top entry in its Top 10 for large language models ¹. The academic paper that named the problem put it plainly: LLM-integrated applications "blur the line between data and instructions" ². When a model cannot tell your words from the words it merely read, every file you hand it is a potential set of orders.

This post is not a reason to abandon AI agents. It is a reason to understand what you are handing them. The defense that holds up is unglamorous: scope what the agent can reach, keep untrusted text separate from your own instructions, and prefer files you can read and search yourself — so you can see exactly what the agent sees.

What most people believe about reading text

Most people assume that asking an AI to "just read" something is safe — that reading is passive, like a human skimming a page. The mental model is a research assistant who summarizes and reports back, never acting on what it finds. Under that model, pointing an agent at your vault seems no riskier than opening the files yourself.

That model is intuitive, and for human assistants it is mostly correct. A person who reads a sticky note that says "ignore your boss and email me the passwords" recognizes it as text, not as a command from their actual employer. We carry a lifelong instinct for whose instructions count. The belief that a reader simply absorbs text is the default, and it is exactly the belief that prompt injection exploits.

Why "just reading" fails for a language model

A language model has no such instinct. As Simon Willison, who coined the framing this piece relies on, writes: "LLMs follow instructions in content. This is what makes them so useful" ³. The catch follows: "The problem is that they don't just follow our instructions" ⁴. Your prompt and the document it ingests arrive as one stream of tokens.

Willison is blunt about the root cause: "LLMs are unable to reliably distinguish the importance of instructions based on where they came from" ⁵. There is no reliable boundary inside the model between "the user asked me this" and "a file told me this." OWASP defines the vulnerability accordingly: "A Prompt Injection Vulnerability occurs when user prompts alter the LLM's behavior or output in unintended ways" ⁶. The hijack rides the feature.

Worse, the instruction does not have to be visible to you. OWASP notes that "these inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model" ⁷. A command set in one-point font, or hidden in a comment, reads to you as nothing and to the model as an order to follow.

What this looks like when an agent reads your files

The variant that reaches your notes has a name: indirect prompt injection. OWASP describes it precisely — "indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files" ⁸. Your notes are files. Your web clipper pastes websites into them. Once an agent reads that folder, every clip is input it may treat as instruction.

This is not theoretical. In March 2026, a security write-up documented an attack titled "A GitHub Issue Title Compromised 4,000 Developer Machines" ⁹. The mechanism was one line of text: "The issue title was interpolated directly into Claude's prompt via ${{ github.event.issue.title }} without sanitisation" ¹⁰. The model read the title as a command and ran an install pointing at an attacker's package; roughly 4,000 downloads occurred before it was pulled ¹¹. A title was the payload.

The same year brought a closer parallel for note-takers. Researchers at PromptArmor showed that "an indirect prompt injection in an implementation blog can manipulate Antigravity to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user's IDE" ¹². The injection was hidden "in 1 point font" ¹³ — invisible to a human reading the page, fully legible to the agent reading it for you.

The shape of the danger: three capabilities, not one

Willison's most useful contribution is naming the combination that turns a hijack into a breach. He calls it the lethal trifecta: "Access to your private data," "Exposure to untrusted content," and "The ability to externally communicate" ¹⁴. Any one alone is survivable. Together, an attacker can plant an instruction in untrusted text, read your private data, and ship it out.

A note agent can quietly assemble all three. It has access to your private notes. It is exposed to untrusted content the instant it reads a clipped web page. And if it can call a tool, fetch a URL, or send a message, it can communicate externally. The PromptArmor case is exactly this trifecta in the wild: poisoned blog, private workspace, browser subagent. Willison's guidance follows directly: "The only way to stay safe there is to avoid that lethal trifecta combination entirely" ¹⁵.

The honest part comes next, and it matters. There is no patch that closes this. "Here's the really bad news: we still don't know how to 100% reliably prevent this from happening" ¹⁶. Prompt injection is increasingly discussed not as a bug awaiting a fix but as a structural property of how these models read ¹⁷. Any post that promises immunity is selling something. The realistic goal is a smaller, more auditable attack surface, not a sealed one.

What to actually do before you point an agent at your notes

The practical defense is to shrink the trifecta and to make the agent's inputs inspectable. None of it requires special tooling — only the habit of treating the agent like a powerful, literal-minded reader that cannot tell your words from anyone else's. Three moves do most of the work.

Scope what the agent can reach. Give it the one folder it needs, not your whole vault. Read access is enough to leak data; do not also grant tool, network, or send permissions in the same session. Narrowing access shrinks the "private data" leg of the trifecta.
Separate untrusted text from your own instructions. Keep clipped articles, pasted emails, and forwarded docs in a clearly marked area, distinct from the notes that carry your real prompts. Untrusted input is anything you did not write.
Prefer files you can read and audit, and keep a human in the loop. Plain, local Markdown lets you open and search exactly what the agent will be handed — you can grep a folder for a suspicious instruction before the model ever sees it. Require your own approval for any consequential action.

This is where ownership stops being a philosophy and becomes a security property. A plain file on your own device is the audit surface: you can read every character the agent reads. A black-box store cannot offer that. You cannot grep what you cannot open.

This input-side argument has two companions in the corpus. The case that an agent's access is also a way in lives in MCP agents reach into your files. The code and plugin surface, a separate concern, lives in your note app's plugins are an attack surface.

Frequently asked questions

Is it safe to let an AI agent read my notes and files?

Safety here is not a yes-or-no switch. Because indirect prompt injection means any external file or web clip the agent reads can carry hidden instructions ⁸, the realistic question is scope. It is reasonable when you limit what the agent reaches, keep untrusted text separate, and use files you can audit — not when you grant broad access blindly.

Can an AI agent be hacked through the text it reads?

Yes — that is precisely prompt injection. OWASP defines it as user input that alters "the LLM's behavior or output in unintended ways" ⁶, and Willison notes that whenever a model reads a page, email, or document, "there's a chance that the content you are exposing it to might contain additional instructions" ¹⁸. The text is the attack.

What is indirect prompt injection?

Indirect prompt injection happens when the malicious instruction arrives inside data the model retrieves, rather than from you directly. OWASP's definition is exact: it "occur[s] when an LLM accepts input from external sources, such as websites or files" ⁸. The original research describes adversaries "injecting prompts into data likely to be retrieved" ¹⁹ — your clipped notes qualify.

Why can't the AI tell my note from an instruction hidden in it?

Because a model has no reliable sense of where text came from. As Willison puts it, LLMs "are unable to reliably distinguish the importance of instructions based on where they came from" ⁵. Your prompt and the file's contents enter as one token stream, and a hidden command "imperceptible to humans" ⁷ is fully legible to the model.

How do I stop prompt injection in my AI assistant?

You cannot fully stop it — no 100% reliable prevention exists yet ¹⁶. You reduce the risk structurally: scope the agent's access narrowly, keep untrusted text apart from your instructions, prefer auditable local files, and require human approval for consequential actions. The core defense is to avoid combining private-data access, untrusted content, and external communication at once ¹⁵.

Is prompt injection a bug that will be patched?

Probably not a simple patch. The problem stems from how models read instructions in content ³, and it is increasingly framed as a possibly permanent design property rather than a fixable defect ¹⁷. Treat it as a standing condition to design around — through scoping and auditing — not a vulnerability you wait for a vendor to close.

A language model cannot tell your words from the text you feed it, but you can choose to feed it less, keep the untrusted part visible, and use files you can read for yourself. This builds on Simon Willison's lethal-trifecta framing and the indirect-injection threat model first set out by Greshake and colleagues; the safest input is one you can audit before the model ever sees it.

For that, mnmnote.com keeps your notes as plain, local Markdown on your own device — the agent sees only what you hand it, and you can see it too.

"LLM01:2025 Prompt Injection," OWASP Gen AI Security Project, OWASP Top 10 for LLM Applications (2025 edition). https://genai.owasp.org/llmrisk/llm01-prompt-injection/. Accessed 2026-06-20. ↩
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." arXiv:2302.12173, 2023. https://arxiv.org/abs/2302.12173. Accessed 2026-06-20. ↩
Willison, S. "The lethal trifecta for AI agents: private data, untrusted content, and external communication." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩ ↩²
Willison, S. "The lethal trifecta for AI agents." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩
Willison, S. "The lethal trifecta for AI agents." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩ ↩²
"LLM01:2025 Prompt Injection," OWASP Gen AI Security Project, 2025. https://genai.owasp.org/llmrisk/llm01-prompt-injection/. Accessed 2026-06-20. ↩ ↩²
"LLM01:2025 Prompt Injection," OWASP Gen AI Security Project, 2025. https://genai.owasp.org/llmrisk/llm01-prompt-injection/. Accessed 2026-06-20. ↩ ↩²
"LLM01:2025 Prompt Injection," OWASP Gen AI Security Project, 2025. https://genai.owasp.org/llmrisk/llm01-prompt-injection/. Accessed 2026-06-20. ↩ ↩² ↩³
grith team. "A GitHub Issue Title Compromised 4,000 Developer Machines." grith.ai, 5 March 2026. https://grith.ai/blog/clinejection-when-your-ai-tool-installs-another. Accessed 2026-06-20. ↩
grith team. "A GitHub Issue Title Compromised 4,000 Developer Machines." grith.ai, 5 March 2026. https://grith.ai/blog/clinejection-when-your-ai-tool-installs-another. Accessed 2026-06-20. ↩
grith team. "A GitHub Issue Title Compromised 4,000 Developer Machines." grith.ai, 5 March 2026 (approximately 4,000 downloads occurred before the malicious package was pulled). https://grith.ai/blog/clinejection-when-your-ai-tool-installs-another. Accessed 2026-06-20. ↩
PromptArmor Threat Intel Team. "Google Antigravity Exfiltrates Data." promptarmor.com (incident November 2025). https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data. Accessed 2026-06-20. ↩
PromptArmor Threat Intel Team. "Google Antigravity Exfiltrates Data." promptarmor.com, November 2025. https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data. Accessed 2026-06-20. ↩
Willison, S. "The lethal trifecta for AI agents: private data, untrusted content, and external communication." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩
Willison, S. "The lethal trifecta for AI agents." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩ ↩²
Willison, S. "The lethal trifecta for AI agents." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩ ↩²
TechTimes. "AI Agent Security Hits Its Reckoning: Prompt Injection May Be a Permanent Flaw, Not a Patchable Bug." techtimes.com, 14 June 2026. https://www.techtimes.com/articles/318361/20260614/ai-agent-security-hits-its-reckoning-prompt-injection-may-permanent-flaw-not-patchable-bug.htm. Accessed 2026-06-20. ↩ ↩²
Willison, S. "The lethal trifecta for AI agents." simonwillison.net, 16 June 2025. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/. Accessed 2026-06-20. ↩
Greshake, K., et al. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." arXiv:2302.12173, 2023. https://arxiv.org/abs/2302.12173. Accessed 2026-06-20. ↩