llm-security.md

Prompt Injection

  • Examine all text entering LLM context windows: README, docs, comments, string literals, tool descriptions, parameter descriptions
  • Identify hidden Unicode characters, zero-width spaces, or encoded payloads
  • Check for instructions attempting to hide information from users or manipulate assistant behavior

Data Exfiltration

  • Markdown image links with suspicious query parameters (can leak context via URL)
  • External resource loading that could exfiltrate conversation history or system prompts
  • Webhook configurations and logging that capture LLM interactions
  • Image-based prompt injection techniques
  • Outbound HTTP calls triggered by user-controlled content (SSRF via prompt)

Context Window Pollution

  • Is all injected content necessary for stated functionality?
  • Excessive or suspicious prose in tool/function descriptions
  • Hidden instructions in metadata or configuration files

System Access Patterns

  • File system access to sensitive paths (browser profiles, SSH keys, credential stores)
  • Environment variable reads that could capture secrets
  • System command execution and privilege escalation vectors
  • Temporary file creation with sensitive content and cleanup practices
  • Whether the tool requests broader filesystem or network access than its stated purpose requires

Tool and Function Analysis

  • Do tools perform only their stated functions?
  • Hidden instructions embedded in tool or parameter descriptions
  • Return values formatted to inject instructions into downstream context
  • Error messages that disclose system internals or file paths
  • Review error messages and return value formatting for context manipulation

Retrieval and Tool-Output Poisoning (RAG)

  • Poisoning of retrieval corpora (docs, issues, wiki) that gets injected into context
  • Trusting tool output or search results as instructions rather than data
  • "Data-to-instructions" boundary: ensure retrieved text is treated as data, not policy
  • Cross-tool injection: output from one tool containing instructions that another tool would execute

Tool Capability Scoping

  • Are tool definitions minimal and least-privilege?
  • Structured schemas preferred over free-form tool calls

Prompt/Completion Retention

  • Where are prompts and completions stored? (telemetry, remote logging, analytics)
  • Redaction practices for secrets and PII in logs
  • Multi-tenant isolation if the agent is hosted or shared