llm-security.md

 1## Prompt Injection
 2
 3- Examine all text entering LLM context windows: README, docs, comments, string literals, tool descriptions, parameter descriptions
 4- Identify hidden Unicode characters, zero-width spaces, or encoded payloads
 5- Check for instructions attempting to hide information from users or manipulate assistant behavior
 6
 7## Data Exfiltration
 8
 9- Markdown image links with suspicious query parameters (can leak context via URL)
10- External resource loading that could exfiltrate conversation history or system prompts
11- Webhook configurations and logging that capture LLM interactions
12- Image-based prompt injection techniques
13- Outbound HTTP calls triggered by user-controlled content (SSRF via prompt)
14
15## Context Window Pollution
16
17- Is all injected content necessary for stated functionality?
18- Excessive or suspicious prose in tool/function descriptions
19- Hidden instructions in metadata or configuration files
20
21## System Access Patterns
22
23- File system access to sensitive paths (browser profiles, SSH keys, credential stores)
24- Environment variable reads that could capture secrets
25- System command execution and privilege escalation vectors
26- Temporary file creation with sensitive content and cleanup practices
27- Whether the tool requests broader filesystem or network access than its stated purpose requires
28
29## Tool and Function Analysis
30
31- Do tools perform only their stated functions?
32- Hidden instructions embedded in tool or parameter descriptions
33- Return values formatted to inject instructions into downstream context
34- Error messages that disclose system internals or file paths
35- Review error messages and return value formatting for context manipulation
36
37## Retrieval and Tool-Output Poisoning (RAG)
38
39- Poisoning of retrieval corpora (docs, issues, wiki) that gets injected into context
40- Trusting tool output or search results as instructions rather than data
41- "Data-to-instructions" boundary: ensure retrieved text is treated as data, not policy
42- Cross-tool injection: output from one tool containing instructions that another tool would execute
43
44## Tool Capability Scoping
45
46- Are tool definitions minimal and least-privilege?
47- Structured schemas preferred over free-form tool calls
48
49## Prompt/Completion Retention
50
51- Where are prompts and completions stored? (telemetry, remote logging, analytics)
52- Redaction practices for secrets and PII in logs
53- Multi-tenant isolation if the agent is hosted or shared