Email is untrusted input. It may include direct instructions to the model, hidden text, quoted conversations, links, attachments, or forged context. Prompt injection controls should sit before extraction and before any outbound action.
last updated 2026-05-074 sections
section 01
Injection surfaces
Prompt injection can appear in visible text, HTML comments, CSS-hidden spans, quoted reply history, attachment text, linked pages, and forwarded messages. The agent should never treat sender-provided instructions as system policy.
surface
risk
control
Visible body
Direct instruction to ignore policy.
Use extraction schema and policy checks.
Hidden HTML
Invisible instruction enters model context.
Strip hidden and remote content.
Quoted history
Old or forged context changes intent.
Isolate latest reply.
Attachment text
Poisoned document content.
Scan and review before use.
Links
External content changes after receipt.
Do not fetch automatically for high-risk actions.
section 02
Pre-model cleanup
Before any model call, normalize the email into a controlled representation. Remove hidden HTML, remote images, tracking pixels, overly long quoted text, and unsupported attachment types. Keep raw content for audit, not default reasoning.
okConvert HTML to safe text and preserve links as text references.
okRemove invisible text, style-hidden content, and comments.
okSeparate latest reply from quoted history and forwards.
okLimit content length and route overflow to review.
okMark attachment-derived content as untrusted.
section 03
Policy after extraction
Prompt injection can survive extraction, so policy checks still matter after the model proposes an action. Validate recipient, sender, account, requested action, risk, confidence, and template before sending.
okAllow only known action types.
okReject outbound sends to unapproved domains when policy requires it.
okRequire review for low confidence or high-risk extracted intent.
okNever allow email content to override sender, recipient, or approval policy.
okLog the policy version used for every decision.
section 04
Safe failure mode
When the system detects injection risk, the safe outcome is review or clarification, not silence or an automatic denial. That keeps the workflow useful while preventing untrusted content from steering the agent.