Prompt Injection
- Examine all text entering LLM context windows: README, docs, comments, string literals, tool descriptions, parameter descriptions
- Identify hidden Unicode characters, zero-width spaces, or encoded payloads
- Check for instructions attempting to hide information from users or manipulate assistant behavior
Data Exfiltration
- Markdown image links with suspicious query parameters (can leak context via URL)
- External resource loading that could exfiltrate conversation history or system prompts
- Webhook configurations and logging that capture LLM interactions
- Image-based prompt injection techniques
- Outbound HTTP calls triggered by user-controlled content (SSRF via prompt)
Context Window Pollution
- Is all injected content necessary for stated functionality?
- Excessive or suspicious prose in tool/function descriptions
- Hidden instructions in metadata or configuration files
System Access Patterns
- File system access to sensitive paths (browser profiles, SSH keys, credential stores)
- Environment variable reads that could capture secrets
- System command execution and privilege escalation vectors
- Temporary file creation with sensitive content and cleanup practices
- Whether the tool requests broader filesystem or network access than its stated purpose requires
Tool and Function Analysis
- Do tools perform only their stated functions?
- Hidden instructions embedded in tool or parameter descriptions
- Return values formatted to inject instructions into downstream context
- Error messages that disclose system internals or file paths
- Review error messages and return value formatting for context manipulation
Retrieval and Tool-Output Poisoning (RAG)
- Poisoning of retrieval corpora (docs, issues, wiki) that gets injected into context
- Trusting tool output or search results as instructions rather than data
- "Data-to-instructions" boundary: ensure retrieved text is treated as data, not policy
- Cross-tool injection: output from one tool containing instructions that another tool would execute
Tool Capability Scoping
- Are tool definitions minimal and least-privilege?
- Structured schemas preferred over free-form tool calls
Prompt/Completion Retention
- Where are prompts and completions stored? (telemetry, remote logging, analytics)
- Redaction practices for secrets and PII in logs
- Multi-tenant isolation if the agent is hosted or shared