llm-security.md | Agent skills

llm-security.md

Prompt Injection

Examine all text entering LLM context windows: README, docs, comments, string literals, tool descriptions, parameter descriptions
Identify hidden Unicode characters, zero-width spaces, or encoded payloads
Check for instructions attempting to hide information from users or manipulate assistant behavior

Data Exfiltration

Markdown image links with suspicious query parameters (can leak context via URL)
External resource loading that could exfiltrate conversation history or system prompts
Webhook configurations and logging that capture LLM interactions
Image-based prompt injection techniques
Outbound HTTP calls triggered by user-controlled content (SSRF via prompt)

Context Window Pollution

Is all injected content necessary for stated functionality?
Excessive or suspicious prose in tool/function descriptions
Hidden instructions in metadata or configuration files

System Access Patterns

File system access to sensitive paths (browser profiles, SSH keys, credential stores)
Environment variable reads that could capture secrets
System command execution and privilege escalation vectors
Temporary file creation with sensitive content and cleanup practices
Whether the tool requests broader filesystem or network access than its stated purpose requires

Tool and Function Analysis

Do tools perform only their stated functions?
Hidden instructions embedded in tool or parameter descriptions
Return values formatted to inject instructions into downstream context
Error messages that disclose system internals or file paths
Review error messages and return value formatting for context manipulation

Retrieval and Tool-Output Poisoning (RAG)

Poisoning of retrieval corpora (docs, issues, wiki) that gets injected into context
Trusting tool output or search results as instructions rather than data
"Data-to-instructions" boundary: ensure retrieved text is treated as data, not policy
Cross-tool injection: output from one tool containing instructions that another tool would execute

Tool Capability Scoping

Are tool definitions minimal and least-privilege?
Structured schemas preferred over free-form tool calls

Prompt/Completion Retention

Where are prompts and completions stored? (telemetry, remote logging, analytics)
Redaction practices for secrets and PII in logs
Multi-tenant isolation if the agent is hosted or shared