feat: migrate to pi-agent-core, harden sandbox

Amolith created 5 months ago

Migrate from @mariozechner/pi-agent to @mariozechner/pi-agent-core
v0.52.8, alongside pi-ai v0.52.8 which adds AWS Bedrock support.

Security improvements for workspace sandboxing:
- Remove tilde expansion from expandPath() to prevent homedir escape
- Add symlink traversal detection in ensureWorkspacePath()
- Filesystem tools (read, grep, ls, find) now enforce containment

New features for custom model configuration:
- api_key field with env var, $VAR, ${VAR}, and !shell command
  resolution
- Optional custom headers support with same value resolution

Other changes:
- Add comprehensive test suite (bun test)
- Improve workspace cleanup on clone/checkout failures
- Update AGENTS.md and README.md documentation

Change summary

.gitignore                         |   3 
AGENTS.md                          |   8 
README.md                          |  20 +
bun.lock                           |  17 +
package.json                       |   7 
src/agent/model-resolver.ts        |  14 +
src/agent/runner.ts                |  58 ++++
src/agent/tools/find.ts            |   5 
src/agent/tools/git/blame.ts       |  17 +
src/agent/tools/git/checkout.ts    |   6 
src/agent/tools/git/diff.ts        |  18 +
src/agent/tools/git/log.ts         |  41 ++-
src/agent/tools/git/refs.ts        |  35 ++-
src/agent/tools/git/show.ts        |  17 +
src/agent/tools/grep.ts            |   5 
src/agent/tools/index.ts           |  16 -
src/agent/tools/ls.ts              |   5 
src/agent/tools/path-utils.ts      |  61 +++++
src/agent/tools/read.ts            |   5 
src/agent/tools/web-fetch.ts       |   2 
src/agent/tools/web-search.ts      |  21 +
src/cli/commands/repo.ts           |  80 +++---
src/cli/commands/web.ts            |  98 ++++----
src/cli/index.ts                   |  61 +----
src/cli/output.ts                  |  20 +
src/cli/parse-args.ts              |  56 +++++
src/config/loader.ts               |  40 +++
src/config/schema.ts               |  23 ++
src/util/env.ts                    |  75 +++++++
src/util/errors.ts                 |   6 
src/util/path.ts                   |  16 +
src/workspace/content.ts           |   2 
test/agent-runner.test.ts          | 214 ++++++++++++++++++++
test/cli-parser.test.ts            |  50 ++++
test/config-loader.test.ts         |  55 +++++
test/config-validation.test.ts     | 206 +++++++++++++++++++
test/expand-home-path.test.ts      |  44 ++++
test/git-log-validation.test.ts    |  56 +++++
test/git-tools.test.ts             | 131 ++++++++++++
test/model-resolver.test.ts        |  70 ++++++
test/web-search.test.ts            |  23 ++
test/workspace-cleanup.test.ts     | 139 +++++++++++++
test/workspace-containment.test.ts | 330 ++++++++++++++++++++++++++++++++
tsconfig.json                      |   2 
44 files changed, 1,934 insertions(+), 244 deletions(-)

Detailed changes

.gitignore 🔗

@@ -0,0 +1,3 @@
+node_modules/
+
+dist/

AGENTS.md 🔗

@@ -10,7 +10,7 @@ bun run build     # Build to dist/
 bun run typecheck # TypeScript check (also: bun run lint)
 ```
 
-No test suite currently exists (`test/` is empty).
+Run `bun test` to execute the test suite.
 
 ## Architecture
 
@@ -48,10 +48,12 @@ Tools use `@sinclair/typebox` for parameter schemas. Execute functions return `{
 
 ### Workspace Sandboxing
 
-Tools must constrain paths to workspace:
+Filesystem tools (`read`, `grep`, `ls`, `find`) must constrain paths to workspace:
 - `ensureWorkspacePath()` in `src/agent/tools/index.ts` validates paths don't escape
 - `resolveToCwd()` / `resolveReadPath()` in `src/agent/tools/path-utils.ts` handle expansion and normalization
 
+Git tools (`git_show`, `git_blame`, `git_diff`, `git_checkout`, `git_log`, `git_refs`) do **not** apply path containment. Refs and paths are passed directly to `simple-git`, which is initialized with `workspacePath` so all commands are scoped to the cloned repository. The user explicitly chooses which repository to clone, so its git objects are trusted content. This is an accepted trust boundary: we sandbox the filesystem but trust git data within the user's chosen repo.
+
 ### Config Cascade
 
 ```
@@ -64,6 +66,8 @@ Config uses TOML, validated against TypeBox schema (`src/config/schema.ts`).
 
 Model strings use `provider:model` format. `custom:name` prefix looks up custom model definitions from config's `[custom_models]` section. Built-in providers delegate to `@mariozechner/pi-ai`.
 
+API key resolution for custom models uses `resolveConfigValue()` from `src/util/env.ts`, which supports bare env var names, `$VAR` / `${VAR}` references, and `!shell-command` execution. Built-in providers fall back to `pi-ai`'s `getEnvApiKey()` (e.g. `ANTHROPIC_API_KEY`).
+
 ### Error Handling
 
 Custom error classes in `src/util/errors.ts` extend `RumiloError` with error codes:

README.md 🔗

@@ -39,6 +39,7 @@ You can define custom OpenAI-compatible endpoints like Ollama, vLLM, or self-hos
 provider = "ollama"
 api = "openai-completions"
 base_url = "http://localhost:11434/v1"
+api_key = "ollama"
 id = "ollama/llama3"
 name = "Llama 3 (Ollama)"
 reasoning = false
@@ -60,6 +61,7 @@ rumilo repo -u <uri> "query" --model custom:ollama
 - `provider`: Provider identifier (e.g., "ollama", "custom")
 - `api`: API type - typically "openai-completions"
 - `base_url`: API endpoint URL
+- `api_key`: API key (see value resolution below)
 - `id`: Unique model identifier
 - `name`: Human-readable display name
 - `reasoning`: Whether the model supports thinking/reasoning
@@ -67,6 +69,24 @@ rumilo repo -u <uri> "query" --model custom:ollama
 - `cost`: Cost per million tokens (can use 0 for local models)
 - `context_window`: Maximum context size in tokens
 - `max_tokens`: Maximum output tokens
+- `headers`: Optional custom HTTP headers (values support same resolution as `api_key`)
+
+#### Value Resolution
+
+The `api_key` and `headers` fields support three formats, following [pi-coding-agent conventions](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/models.md):
+
+- **Environment variable name:** bare name is checked as env var, then used as literal
+  ```toml
+  api_key = "MY_API_KEY"   # resolves process.env.MY_API_KEY, or literal "MY_API_KEY"
+  ```
+- **Env var reference:** explicit `$VAR` or `${VAR}`
+  ```toml
+  api_key = "$MY_API_KEY"  # always resolves from env
+  ```
+- **Shell command:** `!command` executes and uses stdout
+  ```toml
+  api_key = "!security find-generic-password -ws 'my-api'"
+  ```
 
 #### Compatibility Flags (Optional)

bun.lock 🔗

@@ -5,8 +5,8 @@
     "": {
       "name": "rumilo",
       "dependencies": {
-        "@mariozechner/pi-agent": "^0.9.0",
-        "@mariozechner/pi-ai": "^0.6.1",
+        "@mariozechner/pi-agent-core": "^0.52.8",
+        "@mariozechner/pi-ai": "^0.52.8",
         "@sinclair/typebox": "^0.32.14",
         "@tabstack/sdk": "^2.1.0",
         "kagi-ken": "github:czottmann/kagi-ken#1.2.0",
@@ -21,7 +21,77 @@
     },
   },
   "packages": {
-    "@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.61.0", "", { "bin": { "anthropic-ai-sdk": "bin/cli" } }, "sha512-GnlOXrPxow0uoaVB3DGNh9EJBU1MyagCBCLpU+bwDVlj/oOPYIwoiasMWlykkfYcQOrDP2x/zHnRD0xN7PeZPw=="],
+    "@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.73.0", "", { "dependencies": { "json-schema-to-ts": "^3.1.1" }, "peerDependencies": { "zod": "^3.25.0 || ^4.0.0" }, "optionalPeers": ["zod"], "bin": { "anthropic-ai-sdk": "bin/cli" } }, "sha512-URURVzhxXGJDGUGFunIOtBlSl7KWvZiAAKY/ttTkZAkXT9bTPqdk2eK0b8qqSxXpikh3QKPnPYpiyX98zf5ebw=="],
+
+    "@aws-crypto/crc32": ["@aws-crypto/crc32@5.2.0", "", { "dependencies": { "@aws-crypto/util": "^5.2.0", "@aws-sdk/types": "^3.222.0", "tslib": "^2.6.2" } }, "sha512-nLbCWqQNgUiwwtFsen1AdzAtvuLRsQS8rYgMuxCrdKf9kOssamGLuPwyTY9wyYblNr9+1XM8v6zoDTPPSIeANg=="],
+
+    "@aws-crypto/sha256-browser": ["@aws-crypto/sha256-browser@5.2.0", "", { "dependencies": { "@aws-crypto/sha256-js": "^5.2.0", "@aws-crypto/supports-web-crypto": "^5.2.0", "@aws-crypto/util": "^5.2.0", "@aws-sdk/types": "^3.222.0", "@aws-sdk/util-locate-window": "^3.0.0", "@smithy/util-utf8": "^2.0.0", "tslib": "^2.6.2" } }, "sha512-AXfN/lGotSQwu6HNcEsIASo7kWXZ5HYWvfOmSNKDsEqC4OashTp8alTmaz+F7TC2L083SFv5RdB+qU3Vs1kZqw=="],
+
+    "@aws-crypto/sha256-js": ["@aws-crypto/sha256-js@5.2.0", "", { "dependencies": { "@aws-crypto/util": "^5.2.0", "@aws-sdk/types": "^3.222.0", "tslib": "^2.6.2" } }, "sha512-FFQQyu7edu4ufvIZ+OadFpHHOt+eSTBaYaki44c+akjg7qZg9oOQeLlk77F6tSYqjDAFClrHJk9tMf0HdVyOvA=="],
+
+    "@aws-crypto/supports-web-crypto": ["@aws-crypto/supports-web-crypto@5.2.0", "", { "dependencies": { "tslib": "^2.6.2" } }, "sha512-iAvUotm021kM33eCdNfwIN//F77/IADDSs58i+MDaOqFrVjZo9bAal0NK7HurRuWLLpF1iLX7gbWrjHjeo+YFg=="],
+
+    "@aws-crypto/util": ["@aws-crypto/util@5.2.0", "", { "dependencies": { "@aws-sdk/types": "^3.222.0", "@smithy/util-utf8": "^2.0.0", "tslib": "^2.6.2" } }, "sha512-4RkU9EsI6ZpBve5fseQlGNUWKMa1RLPQ1dnjnQoe07ldfIzcsGb5hC5W0Dm7u423KWzawlrpbjXBrXCEv9zazQ=="],
+

package.json 🔗

@@ -4,18 +4,19 @@
   "private": true,
   "type": "module",
   "bin": {
-    "rumilo": "./dist/cli/index.js"
+    "rumilo": "./dist/index.js"
   },
   "scripts": {
     "dev": "bun src/cli/index.ts",
     "build": "bun build src/cli/index.ts --outdir dist --target=node",
     "start": "bun dist/cli/index.js",
     "lint": "bun run --silent typecheck",
+    "test": "bun test",
     "typecheck": "tsc --noEmit"
   },
   "dependencies": {
-    "@mariozechner/pi-ai": "^0.6.1",
-    "@mariozechner/pi-agent": "^0.9.0",
+    "@mariozechner/pi-agent-core": "^0.52.8",
+    "@mariozechner/pi-ai": "^0.52.8",
     "@sinclair/typebox": "^0.32.14",
     "@tabstack/sdk": "^2.1.0",
     "kagi-ken": "github:czottmann/kagi-ken#1.2.0",

src/agent/model-resolver.ts 🔗

@@ -4,12 +4,19 @@ import {
   type RumiloConfig,
 } from "../config/schema.js";
 import { ConfigError } from "../util/errors.js";
+import { resolveHeaders } from "../util/env.js";
 
 export function resolveModel(
   modelString: string,
   config: RumiloConfig,
 ): Model<any> {
-  const [provider, modelName] = modelString.split(":");
+  const colonIndex = modelString.indexOf(":");
+  if (colonIndex === -1) {
+    throw new ConfigError("Model must be in provider:model format");
+  }
+
+  const provider = modelString.slice(0, colonIndex);
+  const modelName = modelString.slice(colonIndex + 1);
 
   if (!provider || !modelName) {
     throw new ConfigError("Model must be in provider:model format");
@@ -71,8 +78,9 @@ function buildCustomModel(config: CustomModelConfig): Model<any> {
     maxTokens: config.max_tokens,
   };
 
-  if (config.headers) {
-    model.headers = config.headers;
+  const resolvedHeaders = resolveHeaders(config.headers);
+  if (resolvedHeaders) {
+    model.headers = resolvedHeaders;
   }
 
   if (config.compat) {

src/agent/runner.ts 🔗

@@ -1,8 +1,9 @@
-import { Agent, ProviderTransport, type AgentEvent } from "@mariozechner/pi-agent";
-import { type AgentTool } from "@mariozechner/pi-ai";
-import { ToolInputError } from "../util/errors.js";
+import { Agent, type AgentEvent, type AgentTool } from "@mariozechner/pi-agent-core";
+import { getEnvApiKey, type AssistantMessage } from "@mariozechner/pi-ai";
 import type { RumiloConfig } from "../config/schema.js";
 import { resolveModel } from "./model-resolver.js";
+import { AgentError } from "../util/errors.js";
+import { resolveConfigValue } from "../util/env.js";
 
 export interface AgentRunOptions {
   model: string;
@@ -15,6 +16,30 @@ export interface AgentRunOptions {
 export interface AgentRunResult {
   message: string;
   usage?: unknown;
+  requestCount: number;
+}
+
+/**
+ * Build a getApiKey callback for the Agent.
+ *
+ * Resolution order:
+ * 1. Custom model config — if a custom model for this provider defines an
+ *    `apiKey` field, resolve it via `resolveConfigValue` (supports env var
+ *    names, `$VAR` references, and `!shell` commands).
+ * 2. pi-ai’s built-in env-var lookup (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.).
+ */
+export function buildGetApiKey(config: RumiloConfig): (provider: string) => string | undefined {
+  return (provider: string) => {
+    if (config.custom_models) {
+      for (const model of Object.values(config.custom_models)) {
+        if (model.provider === provider && model.api_key) {
+          return resolveConfigValue(model.api_key);
+        }
+      }
+    }
+
+    return getEnvApiKey(provider);
+  };
 }
 
 export async function runAgent(query: string, options: AgentRunOptions): Promise<AgentRunResult> {
@@ -24,7 +49,7 @@ export async function runAgent(query: string, options: AgentRunOptions): Promise
       model: resolveModel(options.model, options.config),
       tools: options.tools,
     },
-    transport: new ProviderTransport(),
+    getApiKey: buildGetApiKey(options.config),
   });
 
   if (options.onEvent) {
@@ -33,19 +58,36 @@ export async function runAgent(query: string, options: AgentRunOptions): Promise
 
   await agent.prompt(query);
 
+  // Check for errors in agent state
+  if (agent.state.error) {
+    throw new AgentError(agent.state.error);
+  }
+
   const last = agent.state.messages
     .slice()
     .reverse()
-    .find((msg) => msg.role === "assistant");
+    .find((msg): msg is AssistantMessage => msg.role === "assistant");
+
+  // Check if the last assistant message indicates an error
+  if (last?.stopReason === "error") {
+    throw new AgentError(last.errorMessage ?? "Agent stopped with an unknown error");
+  }
 
   const text = last?.content
-    ?.filter((content) => content.type === "text")
+    ?.filter((content): content is Extract<typeof content, { type: "text" }> => content.type === "text")
     .map((content) => content.text)
     .join("")
     .trim();
 
+  if (text === undefined || text === "") {
+    throw new AgentError("Agent returned no text response");
+  }
+
+  const requestCount = agent.state.messages.filter((msg) => msg.role === "assistant").length;
+
   return {
-    message: text ?? "",
-    usage: (last as any)?.usage,
+    message: text,
+    usage: last?.usage,
+    requestCount,
   };
 }

src/agent/tools/find.ts 🔗

@@ -1,8 +1,8 @@
 import { spawnSync } from "node:child_process";
 import { relative } from "node:path";
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
-import { resolveToCwd } from "./path-utils.js";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
+import { resolveToCwd, ensureWorkspacePath } from "./path-utils.js";
 import { DEFAULT_MAX_BYTES, formatSize, truncateHead } from "../../util/truncate.js";
 import { ToolInputError } from "../../util/errors.js";
 
@@ -28,6 +28,7 @@ export const createFindTool = (workspacePath: string): AgentTool => {
       const searchDir: string = params.path || ".";
       const effectiveLimit = params.limit ?? DEFAULT_LIMIT;
       const searchPath = resolveToCwd(searchDir, workspacePath);
+      ensureWorkspacePath(workspacePath, searchPath);
 
     const args = [
       "--glob",

src/agent/tools/git/blame.ts 🔗

@@ -1,7 +1,12 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
 import { ToolInputError } from "../../../util/errors.js";
+import { formatSize, truncateHead } from "../../../util/truncate.js";
+
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
 
 const BlameSchema = Type.Object({
   path: Type.String({ description: "File path relative to repo root" }),
@@ -17,11 +22,17 @@ export const createGitBlameTool = (workspacePath: string): AgentTool => ({
       throw new ToolInputError("path must be a non-empty string");
     }
     const git = simpleGit(workspacePath);
-    const text = await git.raw(["blame", "--", params.path]);
+    const raw = await git.raw(["blame", "--", params.path]);
+    const truncation = truncateHead(raw);
+
+    let text = truncation.content;
+    if (truncation.truncated) {
+      text += `\n\n[truncated: showing ${truncation.outputLines} of ${truncation.totalLines} lines (${formatSize(truncation.outputBytes)} of ${formatSize(truncation.totalBytes)})]`;
+    }
 
     return {
       content: [{ type: "text", text }],
-      details: { path: params.path },
+      details: { path: params.path, ...(truncation.truncated ? { truncation } : {}) },
     };
   },
 });

src/agent/tools/git/checkout.ts 🔗

@@ -1,8 +1,12 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
 import { ToolInputError } from "../../../util/errors.js";
 
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
+
 const CheckoutSchema = Type.Object({
   ref: Type.String({ description: "Ref to checkout" }),
 });

src/agent/tools/git/diff.ts 🔗

@@ -1,7 +1,12 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
 import { ToolInputError } from "../../../util/errors.js";
+import { formatSize, truncateHead } from "../../../util/truncate.js";
+
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
 
 const DiffSchema = Type.Object({
   ref: Type.Optional(Type.String({ description: "Base ref (optional)" })),
@@ -31,10 +36,17 @@ export const createGitDiffTool = (workspacePath: string): AgentTool => ({
     if (params.ref2) args.push(params.ref2);
     if (params.path) args.push("--", params.path);
 
-    const text = await git.diff(args);
+    const raw = await git.diff(args);
+    const truncation = truncateHead(raw);
+
+    let text = truncation.content;
+    if (truncation.truncated) {
+      text += `\n\n[truncated: showing ${truncation.outputLines} of ${truncation.totalLines} lines (${formatSize(truncation.outputBytes)} of ${formatSize(truncation.totalBytes)})]`;
+    }
+
     return {
       content: [{ type: "text", text }],
-      details: { path: params.path ?? null },
+      details: { path: params.path ?? null, ...(truncation.truncated ? { truncation } : {}) },
     };
   },
 });

src/agent/tools/git/log.ts 🔗

@@ -1,8 +1,14 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
 import { ToolInputError } from "../../../util/errors.js";
 
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
+
+const DEFAULT_LOG_LIMIT = 20;
+
 const LogSchema = Type.Object({
   path: Type.Optional(Type.String({ description: "Filter to commits touching this path" })),
   author: Type.Optional(Type.String({ description: "Filter by author name/email" })),
@@ -21,25 +27,30 @@ export const createGitLogTool = (workspacePath: string): AgentTool => ({
     const git = simpleGit(workspacePath);
     const options: string[] = [];
 
-    if (params.n !== undefined) {
-      if (typeof params.n !== "number" || Number.isNaN(params.n) || params.n <= 0) {
-        throw new ToolInputError("n must be a positive number");
-      }
-      options.push("-n", String(Math.floor(params.n)));
+    const limit = params.n !== undefined ? params.n : DEFAULT_LOG_LIMIT;
+    if (typeof limit !== "number" || Number.isNaN(limit) || limit <= 0) {
+      throw new ToolInputError("n must be a positive number");
     }
+    options.push("-n", String(Math.floor(limit)));
     if (params.oneline) options.push("--oneline");
-    if (params.author && !String(params.author).trim()) {
-      throw new ToolInputError("author must be a non-empty string");
+    if (params.author !== undefined) {
+      if (!String(params.author).trim()) {
+        throw new ToolInputError("author must be a non-empty string");
+      }
+      options.push(`--author=${params.author}`);
     }
-    if (params.author) options.push(`--author=${params.author}`);
-    if (params.since && !String(params.since).trim()) {
-      throw new ToolInputError("since must be a non-empty string");
+    if (params.since !== undefined) {
+      if (!String(params.since).trim()) {
+        throw new ToolInputError("since must be a non-empty string");
+      }
+      options.push(`--since=${params.since}`);
     }
-    if (params.since) options.push(`--since=${params.since}`);
-    if (params.until && !String(params.until).trim()) {
-      throw new ToolInputError("until must be a non-empty string");
+    if (params.until !== undefined) {
+      if (!String(params.until).trim()) {
+        throw new ToolInputError("until must be a non-empty string");
+      }
+      options.push(`--until=${params.until}`);
     }
-    if (params.until) options.push(`--until=${params.until}`);
 
     const result = await git.log(options.concat(params.path ? ["--", params.path] : []));

src/agent/tools/git/refs.ts 🔗

@@ -1,6 +1,11 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
+import { formatSize, truncateHead } from "../../../util/truncate.js";
+
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
 
 const RefsSchema = Type.Object({
   type: Type.Union([
@@ -18,26 +23,28 @@ export const createGitRefsTool = (workspacePath: string): AgentTool => ({
   execute: async (_toolCallId: string, params: any) => {
     const git = simpleGit(workspacePath);
 
+    let raw: string;
+    let baseDetails: Record<string, any> = {};
+
     if (params.type === "tags") {
       const tags = await git.tags();
-      return {
-        content: [{ type: "text", text: tags.all.join("\n") }],
-        details: { count: tags.all.length },
-      };
+      raw = tags.all.join("\n");
+      baseDetails = { count: tags.all.length };
+    } else if (params.type === "remotes") {
+      raw = await git.raw(["branch", "-r"]);
+    } else {
+      raw = await git.raw(["branch", "-a"]);
     }
 
-    if (params.type === "remotes") {
-      const raw = await git.raw(["branch", "-r"]);
-      return {
-        content: [{ type: "text", text: raw }],
-        details: {},
-      };
+    const truncation = truncateHead(raw);
+    let text = truncation.content;
+    if (truncation.truncated) {
+      text += `\n\n[truncated: showing ${truncation.outputLines} of ${truncation.totalLines} lines (${formatSize(truncation.outputBytes)} of ${formatSize(truncation.totalBytes)})]`;
     }
 
-    const raw = await git.raw(["branch", "-a"]);
     return {
-      content: [{ type: "text", text: raw }],
-      details: {},
+      content: [{ type: "text", text }],
+      details: { ...baseDetails, ...(truncation.truncated ? { truncation } : {}) },
     };
   },
 });

src/agent/tools/git/show.ts 🔗

@@ -1,7 +1,12 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import simpleGit from "simple-git";
 import { ToolInputError } from "../../../util/errors.js";
+import { formatSize, truncateHead } from "../../../util/truncate.js";
+
+// Trust boundary: refs and paths are passed directly to simple-git, which is
+// scoped to the workspace. The user chose to clone this repo, so its contents
+// are trusted. See AGENTS.md § Workspace Sandboxing.
 
 const ShowSchema = Type.Object({
   ref: Type.String({ description: "Commit hash or ref" }),
@@ -17,11 +22,17 @@ export const createGitShowTool = (workspacePath: string): AgentTool => ({
       throw new ToolInputError("ref must be a non-empty string");
     }
     const git = simpleGit(workspacePath);
-    const text = await git.show([params.ref]);
+    const raw = await git.show([params.ref]);
+    const truncation = truncateHead(raw);
+
+    let text = truncation.content;
+    if (truncation.truncated) {
+      text += `\n\n[truncated: showing ${truncation.outputLines} of ${truncation.totalLines} lines (${formatSize(truncation.outputBytes)} of ${formatSize(truncation.totalBytes)})]`;
+    }
 
     return {
       content: [{ type: "text", text }],
-      details: { ref: params.ref },
+      details: { ref: params.ref, ...(truncation.truncated ? { truncation } : {}) },
     };
   },
 });

src/agent/tools/grep.ts 🔗

@@ -3,8 +3,8 @@ import { createInterface } from "node:readline";
 import { readFileSync, statSync } from "node:fs";
 import { relative, basename } from "node:path";
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
-import { resolveToCwd } from "./path-utils.js";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
+import { resolveToCwd, ensureWorkspacePath } from "./path-utils.js";
 import {
   DEFAULT_MAX_BYTES,
   formatSize,
@@ -43,6 +43,7 @@ export const createGrepTool = (workspacePath: string): AgentTool => {
     execute: async (_toolCallId: string, params: any) => {
       const searchDir: string | undefined = params.path;
       const searchPath = resolveToCwd(searchDir || ".", workspacePath);
+      ensureWorkspacePath(workspacePath, searchPath);
       let isDirectory = false;
       try {
         isDirectory = statSync(searchPath).isDirectory();

src/agent/tools/index.ts 🔗

@@ -1,6 +1,4 @@
-import { resolve, sep } from "node:path";
-import type { AgentTool } from "@mariozechner/pi-ai";
-import { ToolInputError } from "../../util/errors.js";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 
 export type ToolFactory = (workspacePath: string) => AgentTool<any>;
 
@@ -9,17 +7,7 @@ export interface ToolBundle {
   tools: AgentTool<any>[];
 }
 
-export function ensureWorkspacePath(workspacePath: string, targetPath: string): string {
-  const resolved = resolve(workspacePath, targetPath);
-  const root = workspacePath.endsWith(sep) ? workspacePath : `${workspacePath}${sep}`;
-
-  if (resolved === workspacePath || resolved.startsWith(root)) {
-    return resolved;
-  }
-
-  throw new ToolInputError(`Path escapes workspace: ${targetPath}`);
-}
-
+export { ensureWorkspacePath } from "./path-utils.js";
 export { createReadTool } from "./read.js";
 export { createGrepTool } from "./grep.js";
 export { createLsTool } from "./ls.js";

src/agent/tools/ls.ts 🔗

@@ -1,8 +1,8 @@
 import { existsSync, readdirSync, statSync } from "node:fs";
 import { join } from "node:path";
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
-import { resolveToCwd } from "./path-utils.js";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
+import { resolveToCwd, ensureWorkspacePath } from "./path-utils.js";
 import { DEFAULT_MAX_BYTES, formatSize, truncateHead } from "../../util/truncate.js";
 
 const DEFAULT_LIMIT = 500;
@@ -19,6 +19,7 @@ export const createLsTool = (workspacePath: string): AgentTool => ({
   parameters: LsSchema as any,
   execute: async (_toolCallId: string, params: any) => {
     const resolved = resolveToCwd(params.path || ".", workspacePath);
+    ensureWorkspacePath(workspacePath, resolved);
 
     if (!existsSync(resolved)) {
       throw new Error(`Path does not exist: ${params.path || "."}`);

src/agent/tools/path-utils.ts 🔗

@@ -1,6 +1,6 @@
-import { accessSync, constants } from "node:fs";
-import * as os from "node:os";
-import { isAbsolute, resolve as resolvePath } from "node:path";
+import { accessSync, constants, realpathSync } from "node:fs";
+import { dirname, isAbsolute, resolve as resolvePath, sep } from "node:path";
+import { ToolInputError } from "../../util/errors.js";
 
 const UNICODE_SPACES = /[\u00A0\u2000-\u200A\u202F\u205F\u3000]/g;
 
@@ -33,12 +33,10 @@ export function expandPath(filePath: string): string {
 	let result = filePath.replace(UNICODE_SPACES, " ");
 	result = normalizeAtPrefix(result);
 
-	if (result === "~") {
-		return os.homedir();
-	}
-	if (result.startsWith("~/")) {
-		return resolvePath(os.homedir(), result.slice(2));
-	}
+	// NOTE: tilde expansion is intentionally omitted.
+	// In a workspace-sandboxed context, expanding ~ to the user's home
+	// directory would bypass workspace containment. Tildes are treated
+	// as literal path characters.
 
 	return result;
 }
@@ -68,3 +66,48 @@ export function resolveReadPath(filePath: string, cwd: string): string {
 
 	return resolved;
 }
+
+/**
+ * Resolve the real path of `p`, following symlinks. If `p` does not exist,
+ * walk up to the nearest existing ancestor, resolve *that*, and re-append
+ * the remaining segments. This lets us validate write targets that don't
+ * exist yet while still catching symlink escapes in any ancestor directory.
+ */
+function safeRealpath(p: string): string {
+	try {
+		return realpathSync(p);
+	} catch (err: any) {
+		if (err?.code === "ENOENT") {
+			const parent = dirname(p);
+			if (parent === p) {
+				// filesystem root — nothing more to resolve
+				return p;
+			}
+			const realParent = safeRealpath(parent);
+			const tail = p.slice(parent.length);
+			return realParent + tail;
+		}
+		throw err;
+	}
+}
+
+export function ensureWorkspacePath(workspacePath: string, targetPath: string): string {
+	const resolved = resolvePath(workspacePath, targetPath);
+
+	// Quick textual check first (catches the common case cheaply)
+	const root = workspacePath.endsWith(sep) ? workspacePath : `${workspacePath}${sep}`;
+	if (resolved !== workspacePath && !resolved.startsWith(root)) {
+		throw new ToolInputError(`Path escapes workspace: ${targetPath}`);
+	}
+
+	// Resolve symlinks to catch symlink-based escapes
+	const realWorkspace = safeRealpath(workspacePath);
+	const realTarget = safeRealpath(resolved);
+	const realRoot = realWorkspace.endsWith(sep) ? realWorkspace : `${realWorkspace}${sep}`;
+
+	if (realTarget !== realWorkspace && !realTarget.startsWith(realRoot)) {
+		throw new ToolInputError(`Path escapes workspace via symlink: ${targetPath}`);
+	}
+
+	return resolved;
+}

src/agent/tools/read.ts 🔗

@@ -1,7 +1,7 @@
 import { readFile, stat } from "node:fs/promises";
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
-import { resolveReadPath } from "./path-utils.js";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
+import { resolveReadPath, ensureWorkspacePath } from "./path-utils.js";
 import { DEFAULT_MAX_BYTES, DEFAULT_MAX_LINES, formatSize, truncateHead } from "../../util/truncate.js";
 import { ToolInputError } from "../../util/errors.js";
 
@@ -20,6 +20,7 @@ export const createReadTool = (workspacePath: string): AgentTool => ({
 	parameters: ReadSchema as any,
 	execute: async (_toolCallId: string, params: any) => {
 		const absolutePath = resolveReadPath(params.path, workspacePath);
+		ensureWorkspacePath(workspacePath, absolutePath);
 		const fileStats = await stat(absolutePath);
 
 		if (fileStats.size > MAX_READ_BYTES) {

src/agent/tools/web-fetch.ts 🔗

@@ -1,5 +1,5 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import Tabstack from "@tabstack/sdk";
 import { FetchError, ToolInputError } from "../../util/errors.js";

src/agent/tools/web-search.ts 🔗

@@ -1,7 +1,7 @@
 import { Type } from "@sinclair/typebox";
-import type { AgentTool } from "@mariozechner/pi-ai";
+import type { AgentTool } from "@mariozechner/pi-agent-core";
 import { search } from "kagi-ken";
-import { ToolInputError } from "../../util/errors.js";
+import { FetchError, ToolInputError } from "../../util/errors.js";
 
 const SearchSchema = Type.Object({
   query: Type.String({ description: "Search query" }),
@@ -17,10 +17,17 @@ export const createWebSearchTool = (sessionToken: string): AgentTool => ({
       throw new ToolInputError("Missing Kagi session token");
     }
 
-    const result = await search(params.query, sessionToken);
-    return {
-      content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
-      details: { query: params.query, resultCount: result?.data?.length ?? 0 },
-    };
+    try {
+      const result = await search(params.query, sessionToken);
+      return {
+        content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
+        details: { query: params.query, resultCount: result?.data?.length ?? 0 },
+      };
+    } catch (error: any) {
+      throw new FetchError(
+        `kagi:search?q=${encodeURIComponent(params.query)}`,
+        error?.message ?? String(error),
+      );
+    }
   },
 });

src/cli/commands/repo.ts 🔗

@@ -1,4 +1,5 @@
 import { readFile } from "node:fs/promises";
+import { expandHomePath } from "../../util/path.js";
 import { applyConfigOverrides, loadConfig } from "../../config/loader.js";
 import { createWorkspace } from "../../workspace/manager.js";
 import { createGrepTool } from "../../agent/tools/grep.js";
@@ -35,52 +36,51 @@ export async function runRepoCommand(options: RepoCommandOptions): Promise<void>
   });
 
   const workspace = await createWorkspace({ cleanup: overrides.defaults.cleanup });
-  const logger = createEventLogger({ verbose: options.verbose });
 
-  let systemPrompt = REPO_SYSTEM_PROMPT;
-  const promptPath = overrides.repo.system_prompt_path;
-  if (promptPath) {
-    const home = process.env["HOME"] ?? "";
-    systemPrompt = await readFile(promptPath.replace(/^~\//, `${home}/`), "utf8");
-  }
+  try {
+    const logger = createEventLogger({ verbose: options.verbose });
 
-  const git = simpleGit();
-  const cloneArgs: string[] = [];
-  if (!options.full) {
-    const depth = overrides.repo.default_depth ?? 1;
-    const blobLimit = overrides.repo.blob_limit ?? "5m";
-    cloneArgs.push("--depth", String(depth), `--filter=blob:limit=${blobLimit}`);
-  }
+    let systemPrompt = REPO_SYSTEM_PROMPT;
+    const promptPath = overrides.repo.system_prompt_path;
+    if (promptPath) {
+      systemPrompt = await readFile(expandHomePath(promptPath), "utf8");
+    }
 
-  try {
-    await git.clone(options.uri, workspace.path, cloneArgs);
-  } catch (error: any) {
-    await workspace.cleanup();
-    if (!overrides.defaults.cleanup) {
-      console.error(`Workspace preserved at ${workspace.path}`);
+    const git = simpleGit();
+    const cloneArgs: string[] = [];
+    if (!options.full) {
+      const depth = overrides.repo.default_depth ?? 1;
+      const blobLimit = overrides.repo.blob_limit ?? "5m";
+      cloneArgs.push("--depth", String(depth), `--filter=blob:limit=${blobLimit}`);
     }
-    throw new CloneError(options.uri, error?.message ?? String(error));
-  }
 
-  const repoGit = simpleGit(workspace.path);
-  if (options.ref) {
-    await repoGit.checkout(options.ref);
-  }
+    try {
+      await git.clone(options.uri, workspace.path, cloneArgs);
+    } catch (error: any) {
+      if (!overrides.defaults.cleanup) {
+        console.error(`Workspace preserved at ${workspace.path}`);
+      }
+      throw new CloneError(options.uri, error?.message ?? String(error));
+    }
 
-  const tools = [
-    createReadTool(workspace.path),
-    createGrepTool(workspace.path),
-    createLsTool(workspace.path),
-    createFindTool(workspace.path),
-    createGitLogTool(workspace.path),
-    createGitShowTool(workspace.path),
-    createGitBlameTool(workspace.path),
-    createGitDiffTool(workspace.path),
-    createGitRefsTool(workspace.path),
-    createGitCheckoutTool(workspace.path),
-  ];
+    const repoGit = simpleGit(workspace.path);
+    if (options.ref) {
+      await repoGit.checkout(options.ref);
+    }
+
+    const tools = [
+      createReadTool(workspace.path),
+      createGrepTool(workspace.path),
+      createLsTool(workspace.path),
+      createFindTool(workspace.path),
+      createGitLogTool(workspace.path),
+      createGitShowTool(workspace.path),
+      createGitBlameTool(workspace.path),
+      createGitDiffTool(workspace.path),
+      createGitRefsTool(workspace.path),
+      createGitCheckoutTool(workspace.path),
+    ];
 
-  try {
     const result = await runAgent(options.query, {
       model: overrides.repo.model ?? overrides.defaults.model,
       systemPrompt,
@@ -90,7 +90,7 @@ export async function runRepoCommand(options: RepoCommandOptions): Promise<void>
     });
 
     process.stdout.write(result.message + "\n");
-    printUsageSummary(result.usage as any);
+    printUsageSummary(result.usage as any, result.requestCount);
   } finally {
     await workspace.cleanup();
   }

src/cli/commands/web.ts 🔗

@@ -1,5 +1,6 @@
 import { readFile } from "node:fs/promises";
 import { basename } from "node:path";
+import { expandHomePath } from "../../util/path.js";
 import { applyConfigOverrides, loadConfig } from "../../config/loader.js";
 import { createWorkspace } from "../../workspace/manager.js";
 import { writeWorkspaceFile } from "../../workspace/content.js";
@@ -12,7 +13,7 @@ import { createWebSearchTool } from "../../agent/tools/web-search.js";
 import { runAgent } from "../../agent/runner.js";
 import { WEB_SYSTEM_PROMPT } from "../../agent/prompts/web.js";
 import { createEventLogger, printUsageSummary } from "../output.js";
-import { FetchError, ToolInputError } from "../../util/errors.js";
+import { ConfigError, FetchError } from "../../util/errors.js";
 
 const INJECT_THRESHOLD = 50 * 1024;
 
@@ -32,62 +33,61 @@ export async function runWebCommand(options: WebCommandOptions): Promise<void> {
   });
 
   const workspace = await createWorkspace({ cleanup: overrides.defaults.cleanup });
-  const logger = createEventLogger({ verbose: options.verbose });
 
-  const kagiSession =
-    overrides.web.kagi_session_token ?? overrides.defaults.kagi_session_token ?? process.env["KAGI_SESSION_TOKEN"];
-  const tabstackKey =
-    overrides.web.tabstack_api_key ??
-    overrides.defaults.tabstack_api_key ??
-    process.env["TABSTACK_API_KEY"];
+  try {
+    const logger = createEventLogger({ verbose: options.verbose });
 
-  if (!kagiSession) {
-    throw new ToolInputError("Missing Kagi session token (set KAGI_SESSION_TOKEN or config)");
-  }
-  if (!tabstackKey) {
-    throw new ToolInputError("Missing Tabstack API key (set TABSTACK_API_KEY or config)");
-  }
+    const kagiSession =
+      overrides.web.kagi_session_token ?? overrides.defaults.kagi_session_token ?? process.env["KAGI_SESSION_TOKEN"];
+    const tabstackKey =
+      overrides.web.tabstack_api_key ??
+      overrides.defaults.tabstack_api_key ??
+      process.env["TABSTACK_API_KEY"];
 
-  let systemPrompt = WEB_SYSTEM_PROMPT;
-  const promptPath = overrides.web.system_prompt_path;
-  if (promptPath) {
-    const home = process.env["HOME"] ?? "";
-    systemPrompt = await readFile(promptPath.replace(/^~\//, `${home}/`), "utf8");
-  }
+    if (!kagiSession) {
+      throw new ConfigError("Missing Kagi session token (set KAGI_SESSION_TOKEN or config)");
+    }
+    if (!tabstackKey) {
+      throw new ConfigError("Missing Tabstack API key (set TABSTACK_API_KEY or config)");
+    }
+
+    let systemPrompt = WEB_SYSTEM_PROMPT;
+    const promptPath = overrides.web.system_prompt_path;
+    if (promptPath) {
+      systemPrompt = await readFile(expandHomePath(promptPath), "utf8");
+    }
 
-  const tools = [
-    createWebSearchTool(kagiSession),
-    createWebFetchTool(tabstackKey),
-    createReadTool(workspace.path),
-    createGrepTool(workspace.path),
-    createLsTool(workspace.path),
-    createFindTool(workspace.path),
-  ];
+    const tools = [
+      createWebSearchTool(kagiSession),
+      createWebFetchTool(tabstackKey),
+      createReadTool(workspace.path),
+      createGrepTool(workspace.path),
+      createLsTool(workspace.path),
+      createFindTool(workspace.path),
+    ];
 
-  let seededContext = "";
-  if (options.url) {
-    const fetchTool = createWebFetchTool(tabstackKey);
-    try {
-      const result = await fetchTool.execute("prefetch", { url: options.url, nocache: false });
-      const text = result.content
-        .map((block) => (block.type === "text" ? block.text ?? "" : ""))
-        .join("");
-      if (text.length <= INJECT_THRESHOLD) {
-        seededContext = text;
-      } else {
-        const filename = `web/${basename(new URL(options.url).pathname) || "index"}.md`;
-        await writeWorkspaceFile(workspace.path, filename, text);
-        seededContext = `Fetched content stored at ${filename}`;
+    let seededContext = "";
+    if (options.url) {
+      const fetchTool = createWebFetchTool(tabstackKey);
+      try {
+        const result = await fetchTool.execute("prefetch", { url: options.url, nocache: false });
+        const text = result.content
+          .map((block) => (block.type === "text" ? block.text ?? "" : ""))
+          .join("");
+        if (text.length <= INJECT_THRESHOLD) {
+          seededContext = text;
+        } else {
+          const filename = `web/${basename(new URL(options.url).pathname) || "index"}.md`;
+          await writeWorkspaceFile(workspace.path, filename, text);
+          seededContext = `Fetched content stored at ${filename}`;
+        }
+      } catch (error: any) {
+        throw new FetchError(options.url, error?.message ?? String(error));
       }
-    } catch (error: any) {
-      await workspace.cleanup();
-      throw new FetchError(options.url, error?.message ?? String(error));
     }
-  }
 
-  const query = seededContext ? `${options.query}\n\n${seededContext}` : options.query;
+    const query = seededContext ? `${options.query}\n\n${seededContext}` : options.query;
 
-  try {
     const result = await runAgent(query, {
       model: overrides.web.model ?? overrides.defaults.model,
       systemPrompt,
@@ -97,7 +97,7 @@ export async function runWebCommand(options: WebCommandOptions): Promise<void> {
     });
 
     process.stdout.write(result.message + "\n");
-    printUsageSummary(result.usage as any);
+    printUsageSummary(result.usage as any, result.requestCount);
   } finally {
     await workspace.cleanup();
   }

src/cli/index.ts 🔗

@@ -2,55 +2,32 @@
 import { runWebCommand } from "./commands/web.js";
 import { runRepoCommand } from "./commands/repo.js";
 import { RumiloError } from "../util/errors.js";
+import { parseArgs } from "./parse-args.js";
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+import { fileURLToPath } from "node:url";
 
-interface ParsedArgs {
-  command?: string;
-  options: Record<string, string | boolean>;
-  positional: string[];
-}
-
-function parseArgs(args: string[]): ParsedArgs {
-  const [, , command, ...rest] = args;
-  const options: Record<string, string | boolean> = {};
-  const positional: string[] = [];
+const VERSION = "0.1.0";
 
-  for (let i = 0; i < rest.length; i += 1) {
-    const arg = rest[i];
-    if (!arg) continue;
+async function main() {
+  const { command, options, positional } = parseArgs(process.argv);
 
-    if (arg.startsWith("--")) {
-      const [key, value] = arg.slice(2).split("=");
-      if (!key) continue;
-      if (value !== undefined) {
-        options[key] = value;
-      } else if (rest[i + 1] && !rest[i + 1]?.startsWith("-")) {
-        options[key] = rest[i + 1] as string;
-        i += 1;
-      } else {
-        options[key] = true;
-      }
-    } else if (arg.startsWith("-")) {
-      const short = arg.slice(1);
-      if (short === "u" && rest[i + 1]) {
-        options["uri"] = rest[i + 1] as string;
-        i += 1;
-      } else if (short === "f") {
-        options["full"] = true;
-      } else {
-        options[short] = true;
-      }
-    } else {
-      positional.push(arg);
-    }
+  // Handle version flag coming before any command (e.g., rumilo -v)
+  // When short flags come right after process.argv, they end up in 'command'
+  if (command && (command === "-v" || command === "--version") && Object.keys(options).length === 0 && positional.length === 0) {
+    console.log(`rumilo v${VERSION}`);
+    process.exit(0);
   }
 
-  return { command, options, positional };
-}
+  const actualCommand = command?.startsWith("-") ? undefined : command;
 
-async function main() {
-  const { command, options, positional } = parseArgs(process.argv);
+  // Handle version/short version as flag (before command) or as command
+  if (options["version"] || actualCommand === "version" || actualCommand === "v") {
+    console.log(`rumilo v${VERSION}`);
+    process.exit(0);
+  }
 
-  if (!command || command === "help") {
+  if (!actualCommand || actualCommand === "help" || actualCommand === "--help" || actualCommand === "-h" || options["help"]) {
     console.log("rumilo web <query> [-u URL] [--model <provider:model>] [--verbose] [--no-cleanup]");
     console.log("rumilo repo -u <uri> <query> [--ref <ref>] [--full] [--model <provider:model>] [--verbose] [--no-cleanup]");
     process.exit(0);

src/cli/output.ts 🔗

@@ -1,4 +1,4 @@
-import type { AgentEvent } from "@mariozechner/pi-agent";
+import type { AgentEvent } from "@mariozechner/pi-agent-core";
 
 const MAX_OUTPUT_LINES = 20;
 
@@ -44,10 +44,22 @@ export function createEventLogger(options: OutputOptions) {
   };
 }
 
-export function printUsageSummary(usage: { cost?: { total?: number }; totalTokens?: number; output?: number; input?: number } | undefined) {
+export function printUsageSummary(
+  usage: { cost?: { total?: number }; totalTokens?: number; output?: number; input?: number } | undefined,
+  requestCount?: number,
+) {
   if (!usage) return;
 
-  const cost = usage.cost?.total ?? 0;
   const tokens = usage.totalTokens ?? (usage.output ?? 0) + (usage.input ?? 0);
-  console.error(`\nusage: ${tokens} tokens, cost $${cost.toFixed(4)}`);
+  const rawCost = usage.cost?.total;
+  const cost = typeof rawCost === "number" && !isNaN(rawCost) && rawCost > 0 ? rawCost : undefined;
+
+  let line = `\nusage: ${tokens} tokens`;
+  if (requestCount !== undefined && requestCount > 0) {
+    line += ` across ${requestCount} ${requestCount === 1 ? "request" : "requests"}`;
+  }
+  if (cost !== undefined) {
+    line += `, cost $${cost.toFixed(4)}`;
+  }
+  console.error(line);
 }

src/cli/parse-args.ts 🔗

@@ -0,0 +1,56 @@
+export interface ParsedArgs {
+  command?: string;
+  options: Record<string, string | boolean>;
+  positional: string[];
+}
+
+export function parseArgs(args: string[]): ParsedArgs {
+  const [, , command, ...rest] = args;
+  const options: Record<string, string | boolean> = {};
+  const positional: string[] = [];
+
+  for (let i = 0; i < rest.length; i += 1) {
+    const arg = rest[i];
+    if (!arg) continue;
+
+    if (arg.startsWith("--")) {
+      const eqIndex = arg.indexOf("=", 2);
+      let key: string;
+      let value: string | undefined;
+      if (eqIndex !== -1) {
+        key = arg.slice(2, eqIndex);
+        value = arg.slice(eqIndex + 1);
+      } else {
+        key = arg.slice(2);
+      }
+      if (!key) continue;
+      if (value !== undefined) {
+        options[key] = value;
+      } else if (rest[i + 1] && !rest[i + 1]?.startsWith("-")) {
+        options[key] = rest[i + 1] as string;
+        i += 1;
+      } else {
+        options[key] = true;
+      }
+    } else if (arg.startsWith("-")) {
+      const short = arg.slice(1);
+      if (short === "u") {
+        if (rest[i + 1] && !rest[i + 1]!.startsWith("-")) {
+          options["uri"] = rest[i + 1] as string;
+          i += 1;
+        }
+        // else: -u with no value — uri stays unset, command handler validates
+      } else if (short === "f") {
+        options["full"] = true;
+      } else if (short === "v") {
+        options["version"] = true;
+      } else {
+        options[short] = true;
+      }
+    } else {
+      positional.push(arg);
+    }
+  }
+
+  return { command, options, positional };
+}

src/config/loader.ts 🔗

@@ -1,6 +1,8 @@
 import { readFile } from "node:fs/promises";
 import { resolve } from "node:path";
+import { Value } from "@sinclair/typebox/value";
 import { defaultConfig } from "./defaults.js";
+import { ConfigSchema, PartialConfigSchema } from "./schema.js";
 import type { RumiloConfig } from "./schema.js";
 import { ConfigError } from "../util/errors.js";
 import toml from "toml";
@@ -31,12 +33,27 @@ function mergeConfig(base: RumiloConfig, override: Partial<RumiloConfig>): Rumil
   };
 }
 
-function validateConfig(config: RumiloConfig): void {
-  if (!config.defaults.model) {
-    throw new ConfigError("defaults.model is required");
+function validatePartialConfig(parsed: unknown): asserts parsed is Partial<RumiloConfig> {
+  if (!Value.Check(PartialConfigSchema, parsed)) {
+    const errors = [...Value.Errors(PartialConfigSchema, parsed)];
+    const details = errors
+      .map((e) => `  ${e.path}: ${e.message} (got ${JSON.stringify(e.value)})`)
+      .join("\n");
+    throw new ConfigError(
+      `Invalid config:\n${details}`,
+    );
   }
-  if (typeof config.defaults.cleanup !== "boolean") {
-    throw new ConfigError("defaults.cleanup must be a boolean");
+}
+
+function validateFullConfig(config: unknown): asserts config is RumiloConfig {
+  if (!Value.Check(ConfigSchema, config)) {
+    const errors = [...Value.Errors(ConfigSchema, config)];
+    const details = errors
+      .map((e) => `  ${e.path}: ${e.message} (got ${JSON.stringify(e.value)})`)
+      .join("\n");
+    throw new ConfigError(
+      `Invalid merged config:\n${details}`,
+    );
   }
 }
 
@@ -47,16 +64,21 @@ export async function loadConfig(): Promise<LoadedConfig> {
 
   try {
     const raw = await readFile(configPath, "utf8");
-    const parsed = toml.parse(raw) as Partial<RumiloConfig>;
+    const parsed: unknown = toml.parse(raw);
+    validatePartialConfig(parsed);
     const merged = mergeConfig(base, parsed);
-    validateConfig(merged);
+    validateFullConfig(merged);
     return { config: merged, path: configPath };
   } catch (error: any) {
     if (error?.code === "ENOENT") {
-      validateConfig(base);
+      validateFullConfig(base);
       return { config: base };
     }
 
+    if (error instanceof ConfigError) {
+      throw error;
+    }
+
     if (error instanceof Error) {
       throw new ConfigError(error.message);
     }
@@ -70,6 +92,6 @@ export function applyConfigOverrides(
   overrides: Partial<RumiloConfig>,
 ): RumiloConfig {
   const merged = mergeConfig(config, overrides);
-  validateConfig(merged);
+  validateFullConfig(merged);
   return merged;
 }

src/config/schema.ts 🔗

@@ -1,4 +1,4 @@
-import { Type, type Static } from "@sinclair/typebox";
+import { Type, Kind, type Static, type TObject, type TProperties } from "@sinclair/typebox";
 
 const CustomModelSchema = Type.Object({
   provider: Type.String(),
@@ -16,6 +16,7 @@ const CustomModelSchema = Type.Object({
   }),
   context_window: Type.Number(),
   max_tokens: Type.Number(),
+  api_key: Type.Optional(Type.String()),
   headers: Type.Optional(Type.Record(Type.String(), Type.String())),
   compat: Type.Optional(
     Type.Object({
@@ -33,7 +34,7 @@ const CustomModelSchema = Type.Object({
   ),
 });
 
-const ConfigSchema = Type.Object({
+export const ConfigSchema = Type.Object({
   defaults: Type.Object({
     model: Type.String(),
     cleanup: Type.Boolean(),
@@ -55,5 +56,23 @@ const ConfigSchema = Type.Object({
   custom_models: Type.Optional(Type.Record(Type.String(), CustomModelSchema)),
 });
 
+/** Deep-partial version of ConfigSchema for validating TOML override files. */
+export function partialObject<T extends TProperties>(schema: TObject<T>) {
+  const partial: Record<string, unknown> = {};
+  for (const [key, value] of Object.entries(schema.properties)) {
+    const v = value as any;
+    const inner = v[Kind] === 'Object' && v.properties ? partialObject(v) : v;
+    partial[key] = Type.Optional(inner as any);
+  }
+  return Type.Object(partial as any);
+}
+
+export const PartialConfigSchema = Type.Object({
+  defaults: Type.Optional(partialObject(ConfigSchema.properties.defaults)),
+  web: Type.Optional(partialObject(ConfigSchema.properties.web)),
+  repo: Type.Optional(partialObject(ConfigSchema.properties.repo)),
+  custom_models: Type.Optional(Type.Record(Type.String(), CustomModelSchema)),
+});
+
 export type RumiloConfig = Static<typeof ConfigSchema>;
 export type CustomModelConfig = Static<typeof CustomModelSchema>;

src/util/env.ts 🔗

@@ -0,0 +1,75 @@
+import { execSync } from "node:child_process";
+
+/**
+ * Resolve a configuration value (API key, header value, etc.) to a concrete string.
+ *
+ * Resolution order, following pi-coding-agent's convention:
+ * - `"!command"` — executes the rest as a shell command, uses trimmed stdout
+ * - `"$VAR"` or `"${VAR}"` — treats as an env var reference (with sigil)
+ * - Otherwise checks `process.env[value]` — bare name is tried as env var
+ * - If no env var matches, the string is used as a literal value
+ *
+ * Returns `undefined` only when a shell command fails or produces empty output.
+ */
+export function resolveConfigValue(value: string): string | undefined {
+  if (value.startsWith("!")) {
+    return executeShellCommand(value.slice(1));
+  }
+
+  // Explicit $VAR or ${VAR} reference
+  const envRef = value.match(/^\$\{(.+)\}$|^\$([A-Za-z_][A-Za-z0-9_]*)$/);
+  if (envRef) {
+    const name = envRef[1] ?? envRef[2]!;
+    return process.env[name] ?? undefined;
+  }
+
+  // Bare name — check as env var first, then use as literal
+  const envValue = process.env[value];
+  return envValue || value;
+}
+
+/**
+ * Expand `$VAR` and `${VAR}` references embedded within a larger string.
+ * Unlike `resolveConfigValue`, this handles mixed literal + env-var strings
+ * like `"Bearer $API_KEY"`.
+ */
+export function expandEnvVars(value: string): string {
+  return value.replace(
+    /\$\{([^}]+)\}|\$([A-Za-z_][A-Za-z0-9_]*)/g,
+    (_, braced, bare) => {
+      const name = braced ?? bare;
+      return process.env[name] ?? "";
+    },
+  );
+}
+
+/**
+ * Resolve all values in a headers record using `resolveConfigValue`.
+ * Drops entries whose values resolve to `undefined`.
+ */
+export function resolveHeaders(
+  headers: Record<string, string> | undefined,
+): Record<string, string> | undefined {
+  if (!headers) return undefined;
+  const resolved: Record<string, string> = {};
+  for (const [key, value] of Object.entries(headers)) {
+    const resolvedValue = resolveConfigValue(value);
+    if (resolvedValue) {
+      resolved[key] = resolvedValue;
+    }
+  }
+  return Object.keys(resolved).length > 0 ? resolved : undefined;
+}
+
+function executeShellCommand(command: string): string | undefined {
+  try {
+    const output = execSync(command, {
+      encoding: "utf-8",
+      timeout: 10_000,
+      stdio: ["ignore", "pipe", "ignore"],
+    });
+    return output.trim() || undefined;
+  } catch {
+    return undefined;
+  }
+}

src/util/errors.ts 🔗

@@ -37,3 +37,9 @@ export class ToolInputError extends RumiloError {
     super(message, "TOOL_INPUT_ERROR");
   }
 }
+
+export class AgentError extends RumiloError {
+  constructor(message: string) {
+    super(message, "AGENT_ERROR");
+  }
+}

src/util/path.ts 🔗

@@ -0,0 +1,16 @@
+import { resolve } from "node:path";
+
+/**
+ * Expand a leading ~ in a file path to the user's home directory.
+ * Use for paths outside the workspace (e.g. system_prompt_path).
+ * Workspace-sandboxed paths should NOT use this.
+ */
+export function expandHomePath(filePath: string): string {
+  const home = process.env["HOME"];
+  if (!home) return filePath;
+
+  if (filePath === "~") return home;
+  if (filePath.startsWith("~/")) return resolve(home, filePath.slice(2));
+
+  return filePath;
+}

src/workspace/content.ts 🔗

@@ -1,5 +1,6 @@
 import { mkdir, writeFile } from "node:fs/promises";
 import { dirname, join } from "node:path";
+import { ensureWorkspacePath } from "../agent/tools/path-utils.js";
 
 export interface WorkspaceContent {
   filePath: string;
@@ -13,6 +14,7 @@ export async function writeWorkspaceFile(
   content: string,
 ): Promise<WorkspaceContent> {
   const filePath = join(workspacePath, relativePath);
+  ensureWorkspacePath(workspacePath, filePath);
   await mkdir(dirname(filePath), { recursive: true });
   await writeFile(filePath, content, "utf8");

test/agent-runner.test.ts 🔗

@@ -0,0 +1,214 @@
+import { describe, test, expect, beforeAll, afterAll } from "bun:test";
+import { AgentError } from "../src/util/errors.js";
+import { expandEnvVars, resolveConfigValue, resolveHeaders } from "../src/util/env.js";
+import { buildGetApiKey } from "../src/agent/runner.js";
+import type { RumiloConfig } from "../src/config/schema.js";
+
+const stubConfig: RumiloConfig = {
+  defaults: { model: "anthropic:test", cleanup: true },
+  web: { model: "anthropic:test" },
+  repo: { model: "anthropic:test", default_depth: 1, blob_limit: "5m" },
+};
+
+function customModel(provider: string, apiKey?: string): RumiloConfig {
+  return {
+    ...stubConfig,
+    custom_models: {
+      mymodel: {
+        id: "m1",
+        name: "M1",
+        api: "openai-completions" as any,
+        provider,
+        base_url: "http://localhost:8000/v1",
+        reasoning: false,
+        input: ["text"],
+        cost: { input: 0, output: 0 },
+        context_window: 8192,
+        max_tokens: 4096,
+        ...(apiKey ? { api_key: apiKey } : {}),
+      },
+    },
+  };
+}
+
+describe("AgentError", () => {
+  test("has correct name, code, and inherits from Error", () => {
+    const err = new AgentError("boom");
+    expect(err).toBeInstanceOf(Error);
+    expect(err.name).toBe("AgentError");
+    expect(err.code).toBe("AGENT_ERROR");
+    expect(err.message).toBe("boom");
+  });
+});
+
+describe("resolveConfigValue", () => {
+  const saved: Record<string, string | undefined> = {};
+
+  beforeAll(() => {
+    saved["RUMILO_TEST_KEY"] = process.env["RUMILO_TEST_KEY"];
+    process.env["RUMILO_TEST_KEY"] = "resolved-value";
+  });
+
+  afterAll(() => {
+    if (saved["RUMILO_TEST_KEY"] === undefined) delete process.env["RUMILO_TEST_KEY"];
+    else process.env["RUMILO_TEST_KEY"] = saved["RUMILO_TEST_KEY"];
+  });
+
+  test("resolves bare env var name", () => {
+    expect(resolveConfigValue("RUMILO_TEST_KEY")).toBe("resolved-value");
+  });
+
+  test("resolves $VAR reference", () => {
+    expect(resolveConfigValue("$RUMILO_TEST_KEY")).toBe("resolved-value");
+  });
+
+  test("resolves ${VAR} reference", () => {
+    expect(resolveConfigValue("${RUMILO_TEST_KEY}")).toBe("resolved-value");
+  });
+
+  test("treats unknown name as literal", () => {
+    expect(resolveConfigValue("sk-literal-key-12345")).toBe("sk-literal-key-12345");
+  });
+
+  test("returns undefined for unknown $VAR", () => {
+    expect(resolveConfigValue("$RUMILO_NONEXISTENT_XYZ")).toBeUndefined();
+  });
+
+  test("executes shell commands with ! prefix", () => {
+    expect(resolveConfigValue("!echo hello")).toBe("hello");
+  });
+
+  test("returns undefined for failing shell command", () => {
+    expect(resolveConfigValue("!false")).toBeUndefined();
+  });
+});
+
+describe("expandEnvVars", () => {
+  const saved: Record<string, string | undefined> = {};
+
+  beforeAll(() => {
+    saved["FOO"] = process.env["FOO"];
+    saved["BAR"] = process.env["BAR"];
+    process.env["FOO"] = "hello";
+    process.env["BAR"] = "world";
+  });
+
+  afterAll(() => {
+    if (saved["FOO"] === undefined) delete process.env["FOO"];
+    else process.env["FOO"] = saved["FOO"];
+    if (saved["BAR"] === undefined) delete process.env["BAR"];
+    else process.env["BAR"] = saved["BAR"];
+  });
+
+  test("expands $VAR", () => {
+    expect(expandEnvVars("Bearer $FOO")).toBe("Bearer hello");
+  });
+
+  test("expands ${VAR}", () => {
+    expect(expandEnvVars("Bearer ${FOO}")).toBe("Bearer hello");
+  });
+
+  test("expands multiple vars", () => {
+    expect(expandEnvVars("$FOO-$BAR")).toBe("hello-world");
+  });
+
+  test("missing var becomes empty string", () => {
+    expect(expandEnvVars("key=$NONEXISTENT_RUMILO_VAR_XYZ")).toBe("key=");
+  });
+
+  test("string without vars is unchanged", () => {
+    expect(expandEnvVars("plain text")).toBe("plain text");
+  });
+});
+
+describe("resolveHeaders", () => {
+  const saved: Record<string, string | undefined> = {};
+
+  beforeAll(() => {
+    saved["RUMILO_HDR_KEY"] = process.env["RUMILO_HDR_KEY"];
+    process.env["RUMILO_HDR_KEY"] = "hdr-value";
+  });
+
+  afterAll(() => {
+    if (saved["RUMILO_HDR_KEY"] === undefined) delete process.env["RUMILO_HDR_KEY"];
+    else process.env["RUMILO_HDR_KEY"] = saved["RUMILO_HDR_KEY"];
+  });
+
+  test("returns undefined for undefined input", () => {
+    expect(resolveHeaders(undefined)).toBeUndefined();
+  });
+
+  test("resolves header values via resolveConfigValue", () => {
+    const result = resolveHeaders({ "X-Key": "RUMILO_HDR_KEY" });
+    expect(result).toEqual({ "X-Key": "hdr-value" });
+  });
+
+  test("drops entries that resolve to undefined", () => {
+    const result = resolveHeaders({ "X-Key": "$RUMILO_NONEXISTENT_XYZ" });
+    expect(result).toBeUndefined();
+  });
+});
+
+describe("buildGetApiKey", () => {
+  const saved: Record<string, string | undefined> = {};
+
+  beforeAll(() => {
+    saved["ANTHROPIC_API_KEY"] = process.env["ANTHROPIC_API_KEY"];
+    saved["CUSTOM_KEY"] = process.env["CUSTOM_KEY"];
+    process.env["ANTHROPIC_API_KEY"] = "sk-ant-test";
+    process.env["CUSTOM_KEY"] = "sk-custom-test";
+  });
+
+  afterAll(() => {
+    for (const [k, v] of Object.entries(saved)) {
+      if (v === undefined) delete process.env[k];
+      else process.env[k] = v;
+    }
+  });
+
+  test("falls back to pi-ai env var lookup for built-in providers", () => {
+    const getKey = buildGetApiKey(stubConfig);
+    expect(getKey("anthropic")).toBe("sk-ant-test");
+  });
+
+  test("returns undefined for unknown provider with no config", () => {
+    const getKey = buildGetApiKey(stubConfig);
+    expect(getKey("unknown-provider")).toBeUndefined();
+  });
+
+  test("resolves literal api_key from custom model", () => {
+    const config = customModel("myprovider", "sk-literal-key");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("myprovider")).toBe("sk-literal-key");
+  });
+
+  test("resolves api_key via env var name", () => {
+    const config = customModel("myprovider", "CUSTOM_KEY");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("myprovider")).toBe("sk-custom-test");
+  });
+
+  test("resolves api_key via $VAR reference", () => {
+    const config = customModel("myprovider", "$CUSTOM_KEY");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("myprovider")).toBe("sk-custom-test");
+  });
+
+  test("resolves api_key via shell command", () => {
+    const config = customModel("myprovider", "!echo shell-key");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("myprovider")).toBe("shell-key");
+  });
+
+  test("custom model provider doesn't shadow built-in provider lookup", () => {
+    const config = customModel("other-provider", "sk-other");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("anthropic")).toBe("sk-ant-test");
+  });
+
+  test("falls back to env var lookup when custom model has no api_key", () => {
+    const config = customModel("anthropic");
+    const getKey = buildGetApiKey(config);
+    expect(getKey("anthropic")).toBe("sk-ant-test");
+  });
+});

test/cli-parser.test.ts 🔗

@@ -0,0 +1,50 @@
+import { describe, test, expect } from "bun:test";
+import { parseArgs } from "../src/cli/parse-args.js";
+
+describe("CLI --key=value parsing (issue #6)", () => {
+  test("--key=value with '=' in value preserves full value", () => {
+    const result = parseArgs(["node", "script", "web", "--key=a=b=c"]);
+    expect(result.options["key"]).toBe("a=b=c");
+  });
+
+  test("--key=value without extra '=' still works", () => {
+    const result = parseArgs(["node", "script", "web", "--model=openai:gpt-4"]);
+    expect(result.options["model"]).toBe("openai:gpt-4");
+  });
+
+  test("--flag without value is boolean true", () => {
+    const result = parseArgs(["node", "script", "web", "--verbose"]);
+    expect(result.options["verbose"]).toBe(true);
+  });
+
+  test("--key value (space-separated) works", () => {
+    const result = parseArgs(["node", "script", "web", "--model", "openai:gpt-4"]);
+    expect(result.options["model"]).toBe("openai:gpt-4");
+  });
+});
+
+describe("CLI -u short flag (issue #7)", () => {
+  test("-u does not swallow a following flag as its value", () => {
+    const result = parseArgs(["node", "script", "web", "-u", "--verbose"]);
+    expect(result.options["uri"]).toBeUndefined();
+    expect(result.options["verbose"]).toBe(true);
+  });
+
+  test("-u with valid URL works normally", () => {
+    const result = parseArgs(["node", "script", "web", "-u", "https://example.com"]);
+    expect(result.options["uri"]).toBe("https://example.com");
+  });
+
+  test("-u swallowing -f as value is prevented", () => {
+    const result = parseArgs(["node", "script", "repo", "-u", "-f"]);
+    expect(result.options["uri"]).toBeUndefined();
+    // -u with no valid value is a no-op, -f should be parsed as full flag
+    expect(result.options["full"]).toBe(true);
+  });
+
+  test("-u at end of args leaves uri unset (no stray option)", () => {
+    const result = parseArgs(["node", "script", "web", "-u"]);
+    expect(result.options["uri"]).toBeUndefined();
+    expect(result.options["u"]).toBeUndefined();
+  });
+});

test/config-loader.test.ts 🔗

@@ -0,0 +1,55 @@
+import { describe, test, expect, beforeEach, afterEach } from "bun:test";
+import { mkdtempSync, writeFileSync, rmSync, mkdirSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { ConfigError } from "../src/util/errors.js";
+import { loadConfig } from "../src/config/loader.js";
+
+describe("loadConfig - ConfigError rethrown directly (issue #10)", () => {
+  let configDir: string;
+  let configPath: string;
+  const originalEnv = { ...process.env };
+
+  beforeEach(() => {
+    configDir = mkdtempSync(join(tmpdir(), "rumilo-cfg-test10-"));
+    const xdgBase = join(configDir, "xdg");
+    const rumiloDir = join(xdgBase, "rumilo");
+    mkdirSync(rumiloDir, { recursive: true });
+    configPath = join(rumiloDir, "config.toml");
+    process.env["XDG_CONFIG_HOME"] = xdgBase;
+  });
+
+  afterEach(() => {
+    process.env = { ...originalEnv };
+    try {
+      rmSync(configDir, { recursive: true, force: true });
+    } catch {}
+  });
+
+  test("ConfigError from validation is rethrown with original message and stack", async () => {
+    // Write invalid config that triggers ConfigError from validatePartialConfig
+    writeFileSync(configPath, `[defaults]\nmodel = 42\n`);
+    try {
+      await loadConfig();
+      throw new Error("should have thrown");
+    } catch (e: any) {
+      expect(e).toBeInstanceOf(ConfigError);
+      // The original message should include the validation details, not be re-wrapped
+      expect(e.message).toContain("/defaults/model");
+      // Stack should reference the validation function, not be a generic re-wrap
+      expect(e.stack).toBeDefined();
+    }
+  });
+
+  test("TOML parse error is wrapped as ConfigError with original message", async () => {
+    writeFileSync(configPath, `[invalid toml !!!`);
+    try {
+      await loadConfig();
+      throw new Error("should have thrown");
+    } catch (e: any) {
+      expect(e).toBeInstanceOf(ConfigError);
+      // Should contain the original TOML parse error message
+      expect(e.message.length).toBeGreaterThan(0);
+    }
+  });
+});

test/config-validation.test.ts 🔗

@@ -0,0 +1,206 @@
+import { describe, test, expect, beforeEach, afterEach } from "bun:test";
+import { mkdtempSync, writeFileSync, rmSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { Type } from "@sinclair/typebox";
+import { Value } from "@sinclair/typebox/value";
+import { ConfigError } from "../src/util/errors.js";
+import { loadConfig } from "../src/config/loader.js";
+import { partialObject } from "../src/config/schema.js";
+
+describe("config validation", () => {
+  let configDir: string;
+  let configPath: string;
+  const originalEnv = { ...process.env };
+
+  beforeEach(() => {
+    configDir = mkdtempSync(join(tmpdir(), "rumilo-cfg-test-"));
+    configPath = join(configDir, "config.toml");
+    process.env["XDG_CONFIG_HOME"] = join(configDir, "..");
+    // loadConfig looks for <XDG_CONFIG_HOME>/rumilo/config.toml
+    // So we need the dir structure to match
+    const rumiloDir = join(configDir, "..", "rumilo");
+    require("node:fs").mkdirSync(rumiloDir, { recursive: true });
+    configPath = join(rumiloDir, "config.toml");
+  });
+
+  afterEach(() => {
+    process.env = { ...originalEnv };
+    try {
+      rmSync(configDir, { recursive: true, force: true });
+      // Also clean up the rumilo dir we created
+      const rumiloDir = join(configDir, "..", "rumilo");
+      rmSync(rumiloDir, { recursive: true, force: true });
+    } catch {}
+  });
+
+  test("rejects defaults.model with wrong type (number instead of string)", async () => {
+    writeFileSync(
+      configPath,
+      `[defaults]\nmodel = 42\ncleanup = true\n`,
+    );
+    await expect(loadConfig()).rejects.toThrow(ConfigError);
+    await expect(loadConfig()).rejects.toThrow(/defaults\/model/);
+  });
+
+  test("rejects defaults.cleanup with wrong type (string instead of boolean)", async () => {
+    writeFileSync(
+      configPath,
+      `[defaults]\nmodel = "anthropic:claude-sonnet-4-20250514"\ncleanup = "yes"\n`,
+    );
+    await expect(loadConfig()).rejects.toThrow(ConfigError);
+    await expect(loadConfig()).rejects.toThrow(/defaults\/cleanup/);
+  });
+
+  test("rejects repo.default_depth with wrong type (string instead of number)", async () => {
+    writeFileSync(
+      configPath,
+      `[repo]\ndefault_depth = "deep"\n`,
+    );
+    await expect(loadConfig()).rejects.toThrow(ConfigError);
+    await expect(loadConfig()).rejects.toThrow(/repo\/default_depth/);
+  });
+
+  test("rejects repo.default_depth below minimum (0)", async () => {
+    writeFileSync(
+      configPath,
+      `[repo]\ndefault_depth = 0\n`,
+    );
+    await expect(loadConfig()).rejects.toThrow(ConfigError);
+    await expect(loadConfig()).rejects.toThrow(/default_depth/);
+  });
+
+  test("rejects unknown top-level section type (number instead of object)", async () => {
+    // web should be an object but we pass a string value at top level
+    writeFileSync(
+      configPath,
+      `[defaults]\nmodel = "x"\ncleanup = true\n[web]\nmodel = 123\n`,
+    );
+    await expect(loadConfig()).rejects.toThrow(ConfigError);
+  });
+
+  test("accepts valid partial config (only [repo] section)", async () => {
+    writeFileSync(
+      configPath,
+      `[repo]\nmodel = "anthropic:claude-sonnet-4-20250514"\ndefault_depth = 5\n`,
+    );
+    const { config } = await loadConfig();
+    expect(config.repo.model).toBe("anthropic:claude-sonnet-4-20250514");
+    expect(config.repo.default_depth).toBe(5);
+    // defaults should come from defaultConfig
+    expect(config.defaults.model).toBe("anthropic:claude-sonnet-4-20250514");
+  });
+
+  test("accepts valid complete config", async () => {
+    writeFileSync(
+      configPath,
+      [
+        `[defaults]`,
+        `model = "openai:gpt-4"`,
+        `cleanup = false`,
+        ``,
+        `[web]`,
+        `model = "openai:gpt-4"`,
+        ``,
+        `[repo]`,
+        `model = "openai:gpt-4"`,
+        `default_depth = 3`,
+        `blob_limit = "10m"`,
+      ].join("\n"),
+    );
+    const { config } = await loadConfig();
+    expect(config.defaults.model).toBe("openai:gpt-4");
+    expect(config.defaults.cleanup).toBe(false);
+    expect(config.repo.default_depth).toBe(3);
+  });
+
+  test("error message includes path and expected type for diagnostics", async () => {
+    writeFileSync(
+      configPath,
+      `[defaults]\nmodel = 42\ncleanup = true\n`,
+    );
+    try {
+      await loadConfig();
+      throw new Error("should have thrown");
+    } catch (e: any) {
+      expect(e).toBeInstanceOf(ConfigError);
+      expect(e.message).toContain("/defaults/model");
+      expect(e.message).toMatch(/string/i);
+    }
+  });
+});
+
+describe("partialObject deep-partial behavior", () => {
+  const NestedSchema = Type.Object({
+    name: Type.String(),
+    inner: Type.Object({
+      host: Type.String(),
+      port: Type.Number(),
+    }),
+  });
+
+  const PartialNested = partialObject(NestedSchema);
+
+  test("accepts empty object (all fields optional at every level)", () => {
+    const result = Value.Check(PartialNested, {});
+    expect(result).toBe(true);
+  });
+
+  test("accepts object with nested section present but inner fields omitted", () => {
+    const result = Value.Check(PartialNested, { inner: {} });
+    expect(result).toBe(true);
+  });
+
+  test("accepts object with partial inner fields of a nested object", () => {
+    const result = Value.Check(PartialNested, { inner: { host: "localhost" } });
+    expect(result).toBe(true);
+  });
+
+  test("accepts fully specified object", () => {
+    const result = Value.Check(PartialNested, {
+      name: "test",
+      inner: { host: "localhost", port: 8080 },
+    });
+    expect(result).toBe(true);
+  });
+
+  test("rejects wrong type inside nested object", () => {
+    const result = Value.Check(PartialNested, { inner: { port: "not-a-number" } });
+    expect(result).toBe(false);
+  });
+
+  test("rejects wrong type at top level", () => {
+    const result = Value.Check(PartialNested, { name: 123 });
+    expect(result).toBe(false);
+  });
+
+  test("does not recurse into Type.Record", () => {
+    const SchemaWithRecord = Type.Object({
+      headers: Type.Record(Type.String(), Type.String()),
+    });
+    const Partial = partialObject(SchemaWithRecord);
+    // Record should remain as-is (not turned into a partial object)
+    // Valid: omitted entirely
+    expect(Value.Check(Partial, {})).toBe(true);
+    // Valid: proper record
+    expect(Value.Check(Partial, { headers: { "x-key": "val" } })).toBe(true);
+    // Invalid: wrong value type in record
+    expect(Value.Check(Partial, { headers: { "x-key": 42 } })).toBe(false);
+  });
+
+  test("handles deeply nested objects (3 levels)", () => {
+    const DeepSchema = Type.Object({
+      level1: Type.Object({
+        level2: Type.Object({
+          value: Type.Number(),
+        }),
+      }),
+    });
+    const PartialDeep = partialObject(DeepSchema);
+    expect(Value.Check(PartialDeep, {})).toBe(true);
+    expect(Value.Check(PartialDeep, { level1: {} })).toBe(true);
+    expect(Value.Check(PartialDeep, { level1: { level2: {} } })).toBe(true);
+    expect(Value.Check(PartialDeep, { level1: { level2: { value: 42 } } })).toBe(true);
+    expect(Value.Check(PartialDeep, { level1: { level2: { value: "nope" } } })).toBe(false);
+  });
+});

test/expand-home-path.test.ts 🔗

@@ -0,0 +1,44 @@
+import { describe, test, expect, beforeEach, afterEach } from "bun:test";
+import { expandHomePath } from "../src/util/path.js";
+
+describe("expandHomePath", () => {
+  let savedHome: string | undefined;
+
+  beforeEach(() => {
+    savedHome = process.env["HOME"];
+  });
+
+  afterEach(() => {
+    if (savedHome === undefined) {
+      delete process.env["HOME"];
+    } else {
+      process.env["HOME"] = savedHome;
+    }
+  });
+
+  test("returns path unchanged when HOME is unset", () => {
+    delete process.env["HOME"];
+    expect(expandHomePath("~/foo/bar")).toBe("~/foo/bar");
+  });
+
+  test("bare ~ returns HOME", () => {
+    process.env["HOME"] = "/Users/alice";
+    expect(expandHomePath("~")).toBe("/Users/alice");
+  });
+
+  test("~/foo/bar expands to $HOME/foo/bar", () => {
+    process.env["HOME"] = "/Users/alice";
+    expect(expandHomePath("~/foo/bar")).toBe("/Users/alice/foo/bar");
+  });
+
+  test("paths without tilde are returned unchanged", () => {
+    process.env["HOME"] = "/Users/alice";
+    expect(expandHomePath("/absolute/path")).toBe("/absolute/path");
+    expect(expandHomePath("relative/path")).toBe("relative/path");
+  });
+
+  test("~user/foo is returned unchanged (not our expansion)", () => {
+    process.env["HOME"] = "/Users/alice";
+    expect(expandHomePath("~user/foo")).toBe("~user/foo");
+  });
+});

test/git-log-validation.test.ts 🔗

@@ -0,0 +1,56 @@
+import { describe, test, expect, beforeAll, afterAll } from "bun:test";
+import { mkdtempSync, rmSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import simpleGit from "simple-git";
+import { createGitLogTool } from "../src/agent/tools/git/log.js";
+import { ToolInputError } from "../src/util/errors.js";
+
+let workDir: string;
+let git: ReturnType<typeof simpleGit>;
+
+beforeAll(async () => {
+  workDir = mkdtempSync(join(tmpdir(), "rumilo-gitlog-test-"));
+  git = simpleGit(workDir);
+  await git.init();
+  await git.addConfig("user.name", "Test");
+  await git.addConfig("user.email", "test@test.com");
+  writeFileSync(join(workDir, "file.txt"), "hello");
+  await git.add("file.txt");
+  await git.commit("initial commit");
+});
+
+afterAll(() => {
+  try {
+    rmSync(workDir, { recursive: true, force: true });
+  } catch {}
+});
+
+describe("git_log validation - dead code fix (issue #12)", () => {
+  test("whitespace-only author throws ToolInputError", async () => {
+    const tool = createGitLogTool(workDir);
+    await expect(tool.execute("id", { author: "   " })).rejects.toThrow(ToolInputError);
+  });
+
+  test("empty-string author throws ToolInputError", async () => {
+    const tool = createGitLogTool(workDir);
+    await expect(tool.execute("id", { author: "" })).rejects.toThrow(ToolInputError);
+  });
+
+  test("empty-string since throws ToolInputError", async () => {
+    const tool = createGitLogTool(workDir);
+    await expect(tool.execute("id", { since: "  " })).rejects.toThrow(ToolInputError);
+  });
+
+  test("empty-string until throws ToolInputError", async () => {
+    const tool = createGitLogTool(workDir);
+    await expect(tool.execute("id", { until: "  " })).rejects.toThrow(ToolInputError);
+  });
+
+  test("valid author is accepted", async () => {
+    const tool = createGitLogTool(workDir);
+    // Should not throw
+    const result: any = await tool.execute("id", { author: "Test" });
+    expect(result.details.count).toBeGreaterThanOrEqual(1);
+  });
+});

test/git-tools.test.ts 🔗

@@ -0,0 +1,131 @@
+import { describe, test, expect, beforeAll, afterAll } from "bun:test";
+import { mkdtempSync, rmSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import simpleGit from "simple-git";
+import { createGitShowTool } from "../src/agent/tools/git/show.js";
+import { createGitDiffTool } from "../src/agent/tools/git/diff.js";
+import { createGitBlameTool } from "../src/agent/tools/git/blame.js";
+import { createGitLogTool } from "../src/agent/tools/git/log.js";
+import { DEFAULT_MAX_LINES, DEFAULT_MAX_BYTES } from "../src/util/truncate.js";
+
+function textOf(result: any): string {
+  return result.content[0].text;
+}
+
+let workDir: string;
+let git: ReturnType<typeof simpleGit>;
+
+beforeAll(async () => {
+  workDir = mkdtempSync(join(tmpdir(), "rumilo-git-test-"));
+  git = simpleGit(workDir);
+  await git.init();
+  await git.addConfig("user.name", "Test");
+  await git.addConfig("user.email", "test@test.com");
+
+  // Create a large file for truncation tests
+  const largeLine = "x".repeat(100);
+  const largeContent = Array.from({ length: 3000 }, (_, i) => `${i}: ${largeLine}`).join("\n");
+  writeFileSync(join(workDir, "large.txt"), largeContent);
+  await git.add("large.txt");
+  await git.commit("add large file");
+
+  // Create many commits for log default-limit test
+  for (let i = 0; i < 30; i++) {
+    writeFileSync(join(workDir, "counter.txt"), String(i));
+    await git.add("counter.txt");
+    await git.commit(`commit number ${i}`);
+  }
+});
+
+afterAll(() => {
+  try {
+    rmSync(workDir, { recursive: true, force: true });
+  } catch {}
+});
+
+describe("git_show truncation (issue #8)", () => {
+  test("truncates large output and appends notice", async () => {
+    const tool = createGitShowTool(workDir);
+    // The first commit has the large file diff, which should exceed truncation limits
+    const logs = await git.log();
+    const firstCommitHash = logs.all[logs.all.length - 1]!.hash;
+    const result = await tool.execute("call-1", { ref: firstCommitHash });
+    const text = textOf(result);
+
+    // Output should be bounded - not return all 3000+ lines raw
+    const lines = text.split("\n");
+    expect(lines.length).toBeLessThanOrEqual(DEFAULT_MAX_LINES + 5); // small margin for notice
+    expect(Buffer.byteLength(text, "utf-8")).toBeLessThanOrEqual(DEFAULT_MAX_BYTES + 500); // margin for notice
+    expect(text).toContain("[truncated");
+  });
+
+  test("small output is not truncated", async () => {
+    const tool = createGitShowTool(workDir);
+    const result = await tool.execute("call-2", { ref: "HEAD" });
+    const text = textOf(result);
+    // HEAD commit is small (counter.txt change), should NOT be truncated
+    expect(text).not.toContain("[truncated");
+  });
+});
+
+describe("git_diff truncation (issue #8)", () => {
+  test("truncates large diff output", async () => {
+    const tool = createGitDiffTool(workDir);
+    const logs = await git.log();
+    const firstCommitHash = logs.all[logs.all.length - 1]!.hash;
+    const secondCommitHash = logs.all[logs.all.length - 2]!.hash;
+    // Diff between first commit (large file add) and second commit
+    const result = await tool.execute("call-3", { ref: firstCommitHash, ref2: secondCommitHash });
+    const text = textOf(result);
+    // The diff won't be huge (only counter.txt changes), so let's create a proper large diff scenario
+    // Instead, diff from the first commit to HEAD which has many changes but also large.txt unchanged
+    // Better: modify large.txt to create a big diff
+    // Actually, let's just verify the mechanism works by checking the first commit via show already.
+    // For diff specifically, create a modified version of large.txt
+    const largeLine2 = "y".repeat(100);
+    const largeContent2 = Array.from({ length: 3000 }, (_, i) => `${i}: ${largeLine2}`).join("\n");
+    writeFileSync(join(workDir, "large.txt"), largeContent2);
+    const result2 = await tool.execute("call-3b", { ref: "HEAD" });
+    const text2 = textOf(result2);
+    const lines2 = text2.split("\n");
+    expect(lines2.length).toBeLessThanOrEqual(DEFAULT_MAX_LINES + 5);
+    expect(text2).toContain("[truncated");
+    // Restore the file
+    await git.checkout(["--", "large.txt"]);
+  });
+});
+
+describe("git_blame truncation (issue #8)", () => {
+  test("truncates large blame output", async () => {
+    const tool = createGitBlameTool(workDir);
+    const result = await tool.execute("call-4", { path: "large.txt" });
+    const text = textOf(result);
+    const lines = text.split("\n");
+    expect(lines.length).toBeLessThanOrEqual(DEFAULT_MAX_LINES + 5);
+    expect(Buffer.byteLength(text, "utf-8")).toBeLessThanOrEqual(DEFAULT_MAX_BYTES + 500);
+    expect(text).toContain("[truncated");
+  });
+});
+
+describe("git_log default limit (issue #9)", () => {
+  test("returns at most 20 commits when n is not specified", async () => {
+    const tool = createGitLogTool(workDir);
+    const result: any = await tool.execute("call-5", {});
+    // We have 31 commits total (1 large file + 30 counter), default should limit to 20
+    expect(result.details.count).toBeLessThanOrEqual(20);
+    expect(result.details.count).toBe(20);
+  });
+
+  test("explicit n overrides default limit", async () => {
+    const tool = createGitLogTool(workDir);
+    const result: any = await tool.execute("call-6", { n: 5 });
+    expect(result.details.count).toBe(5);
+  });
+
+  test("explicit n larger than 20 works", async () => {
+    const tool = createGitLogTool(workDir);
+    const result: any = await tool.execute("call-7", { n: 25 });
+    expect(result.details.count).toBe(25);
+  });
+});

test/model-resolver.test.ts 🔗

@@ -0,0 +1,70 @@
+import { describe, test, expect } from "bun:test";
+import { resolveModel } from "../src/agent/model-resolver.js";
+import type { RumiloConfig } from "../src/config/schema.js";
+import { ConfigError } from "../src/util/errors.js";
+
+// Minimal config stub for tests
+const stubConfig: RumiloConfig = {
+  defaults: { model: "test:m", cleanup: true },
+  web: { model: "test:m" },
+  repo: { model: "test:m", default_depth: 1, blob_limit: "5m" },
+  custom_models: {},
+};
+
+describe("resolveModel - colon handling (issue #5)", () => {
+  test("model string with multiple colons preserves segments after second colon", () => {
+    // e.g. "openrouter:google/gemini-2.5-pro:free" should parse as
+    // provider = "openrouter", modelName = "google/gemini-2.5-pro:free"
+    // This will throw from getModel (unknown provider) but the parsed modelName
+    // should contain the full string after the first colon.
+    // We test via custom: prefix where we can control resolution.
+    const config: RumiloConfig = {
+      ...stubConfig,
+      custom_models: {
+        "name:with:colons": {
+          id: "test-id",
+          name: "test",
+          api: "openai",
+          provider: "test",
+          base_url: "http://localhost",
+          reasoning: false,
+          input: ["text"],
+          cost: { input: 0, output: 0 },
+          context_window: 1000,
+          max_tokens: 500,
+        },
+      },
+    };
+    // "custom:name:with:colons" should split as provider="custom", modelName="name:with:colons"
+    const model = resolveModel("custom:name:with:colons", config);
+    expect(model.id).toBe("test-id");
+  });
+
+  test("simple provider:model still works", () => {
+    // This will call getModel which may throw for unknown providers,
+    // but at minimum the split should be correct. Test with custom.
+    const config: RumiloConfig = {
+      ...stubConfig,
+      custom_models: {
+        "simple": {
+          id: "simple-id",
+          name: "simple",
+          api: "openai",
+          provider: "test",
+          base_url: "http://localhost",
+          reasoning: false,
+          input: ["text"],
+          cost: { input: 0, output: 0 },
+          context_window: 1000,
+          max_tokens: 500,
+        },
+      },
+    };
+    const model = resolveModel("custom:simple", config);
+    expect(model.id).toBe("simple-id");
+  });
+
+  test("rejects model string without colon", () => {
+    expect(() => resolveModel("nocodelimiter", stubConfig)).toThrow(ConfigError);
+  });
+});

test/web-search.test.ts 🔗

@@ -0,0 +1,23 @@
+import { describe, test, expect, mock } from "bun:test";
+import { FetchError, ToolInputError } from "../src/util/errors.js";
+
+// Mock kagi-ken so the search function throws, exercising the FetchError wrapping
+mock.module("kagi-ken", () => ({
+  search: async () => {
+    throw new Error("Unauthorized");
+  },
+}));
+
+import { createWebSearchTool } from "../src/agent/tools/web-search.js";
+
+describe("web_search error handling (issue #11)", () => {
+  test("missing session token throws ToolInputError", async () => {
+    const tool = createWebSearchTool("");
+    await expect(tool.execute("id", { query: "test" })).rejects.toThrow(ToolInputError);
+  });
+
+  test("search API failure is wrapped as FetchError", async () => {
+    const tool = createWebSearchTool("invalid-token-xxx");
+    await expect(tool.execute("id", { query: "test query" })).rejects.toThrow(FetchError);
+  });
+});

test/workspace-cleanup.test.ts 🔗

@@ -0,0 +1,139 @@
+import { describe, test, expect, beforeEach, afterEach } from "bun:test";
+import { readdirSync, mkdtempSync } from "node:fs";
+import { rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { execSync } from "node:child_process";
+import { ConfigError } from "../src/util/errors.js";
+
+/**
+ * Snapshot rumilo-* dirs in tmpdir so we can detect leaks.
+ */
+function rumiloTmpDirs(): Set<string> {
+  return new Set(readdirSync(tmpdir()).filter((n) => n.startsWith("rumilo-")));
+}
+
+function leakedDirs(before: Set<string>, after: Set<string>): string[] {
+  return [...after].filter((d) => !before.has(d));
+}
+
+async function cleanupLeaked(leaked: string[]): Promise<void> {
+  for (const d of leaked) {
+    await rm(join(tmpdir(), d), { recursive: true, force: true });
+  }
+}
+
+// ─── web command: workspace leaked on missing credentials ───────────
+
+describe("web command – workspace cleanup on early failure", () => {
+  const origEnv = { ...process.env };
+
+  beforeEach(() => {
+    // Ensure credential env vars are absent so validation throws.
+    delete process.env["KAGI_SESSION_TOKEN"];
+    delete process.env["TABSTACK_API_KEY"];
+  });
+
+  afterEach(() => {
+    process.env = { ...origEnv };
+  });
+
+  test("workspace dir is removed when credential validation throws", async () => {
+    const before = rumiloTmpDirs();
+
+    const { runWebCommand } = await import("../src/cli/commands/web.js");
+
+    try {
+      await runWebCommand({
+        query: "test",
+        verbose: false,
+        cleanup: true,
+      });
+    } catch (e: any) {
+      expect(e).toBeInstanceOf(ConfigError);
+    }
+
+    const after = rumiloTmpDirs();
+    const leaked = leakedDirs(before, after);
+
+    // Safety: clean up any leaked dirs so the test doesn't pollute.
+    await cleanupLeaked(leaked);
+
+    // If this fails, the workspace was created but not cleaned up – a leak.
+    expect(leaked).toEqual([]);
+  });
+});
+
+// ─── repo command: workspace leaked on checkout failure ─────────────
+
+describe("repo command – workspace cleanup on early failure", () => {
+  const origEnv = { ...process.env };
+  let localRepo: string;
+
+  beforeEach(() => {
+    // Create a small local bare git repo so clone succeeds without network.
+    localRepo = mkdtempSync(join(tmpdir(), "rumilo-test-bare-"));
+    execSync("git init --bare", { cwd: localRepo, stdio: "ignore" });
+    // Create a temporary work clone to add a commit (bare repos need content)
+    const workClone = mkdtempSync(join(tmpdir(), "rumilo-test-work-"));
+    execSync(`git clone ${localRepo} work`, { cwd: workClone, stdio: "ignore" });
+    const workDir = join(workClone, "work");
+    execSync("git config user.email test@test.com && git config user.name Test", { cwd: workDir, stdio: "ignore" });
+    execSync("echo hello > README.md && git add . && git commit -m init", { cwd: workDir, stdio: "ignore" });
+    execSync("git push", { cwd: workDir, stdio: "ignore" });
+    // Clean up work clone
+    execSync(`rm -rf ${workClone}`, { stdio: "ignore" });
+  });
+
+  afterEach(async () => {
+    process.env = { ...origEnv };
+    await rm(localRepo, { recursive: true, force: true });
+  });
+
+  test("workspace dir is removed when clone fails", async () => {
+    const before = rumiloTmpDirs();
+
+    const { runRepoCommand } = await import("../src/cli/commands/repo.js");
+
+    try {
+      await runRepoCommand({
+        query: "test",
+        uri: "file:///nonexistent-path/repo.git",
+        full: false,
+        verbose: false,
+        cleanup: true,
+      });
+    } catch {
+      // expected – clone will fail
+    }
+
+    const after = rumiloTmpDirs();
+    const leaked = leakedDirs(before, after);
+    await cleanupLeaked(leaked);
+    expect(leaked).toEqual([]);
+  });
+
+  test("workspace dir is removed when ref checkout fails after clone", async () => {
+    const before = rumiloTmpDirs();
+
+    const { runRepoCommand } = await import("../src/cli/commands/repo.js");
+
+    try {
+      await runRepoCommand({
+        query: "test",
+        uri: localRepo,
+        ref: "nonexistent-ref-abc123",
+        full: true,  // full clone for local bare repo compatibility
+        verbose: false,
+        cleanup: true,
+      });
+    } catch {
+      // expected – checkout of bad ref will fail
+    }
+
+    const after = rumiloTmpDirs();
+    const leaked = leakedDirs(before, after);
+    await cleanupLeaked(leaked);
+    expect(leaked).toEqual([]);
+  });
+});

test/workspace-containment.test.ts 🔗

@@ -0,0 +1,330 @@
+import { describe, test, expect, beforeAll, afterAll } from "bun:test";
+import { mkdtemp, rm, writeFile, mkdir } from "node:fs/promises";
+import { symlinkSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, resolve } from "node:path";
+import { ensureWorkspacePath } from "../src/agent/tools/index.js";
+import { expandPath, resolveToCwd, resolveReadPath } from "../src/agent/tools/path-utils.js";
+import { writeWorkspaceFile } from "../src/workspace/content.js";
+
+let workspace: string;
+
+beforeAll(async () => {
+	workspace = await mkdtemp(join(tmpdir(), "rumilo-test-"));
+	await mkdir(join(workspace, "subdir"), { recursive: true });
+	await writeFile(join(workspace, "hello.txt"), "hello");
+	await writeFile(join(workspace, "subdir", "nested.txt"), "nested");
+});
+
+afterAll(async () => {
+	await rm(workspace, { recursive: true, force: true });
+});
+
+// ─── ensureWorkspacePath ────────────────────────────────────────────
+
+describe("ensureWorkspacePath", () => {
+	test("allows workspace root itself", () => {
+		const result = ensureWorkspacePath(workspace, ".");
+		expect(result).toBe(workspace);
+	});
+
+	test("allows a relative child path", () => {
+		const result = ensureWorkspacePath(workspace, "hello.txt");
+		expect(result).toBe(join(workspace, "hello.txt"));
+	});
+
+	test("allows nested relative path", () => {
+		const result = ensureWorkspacePath(workspace, "subdir/nested.txt");
+		expect(result).toBe(join(workspace, "subdir", "nested.txt"));
+	});
+
+	test("rejects .. traversal escaping workspace", () => {
+		expect(() => ensureWorkspacePath(workspace, "../../../etc/passwd")).toThrow("Path escapes workspace");
+	});
+
+	test("rejects absolute path outside workspace", () => {
+		expect(() => ensureWorkspacePath(workspace, "/etc/passwd")).toThrow("Path escapes workspace");
+	});
+
+	test("allows absolute path inside workspace", () => {
+		const absInside = join(workspace, "hello.txt");
+		const result = ensureWorkspacePath(workspace, absInside);
+		expect(result).toBe(absInside);
+	});
+});
+
+// ─── expandPath: tilde must NOT escape workspace ────────────────────
+
+describe("expandPath - tilde handling for workspace sandboxing", () => {
+	test("tilde alone must not expand to homedir", () => {
+		const result = expandPath("~");
+		// After fix, ~ should remain literal (not expand to homedir)
+		expect(result).toBe("~");
+	});
+
+	test("tilde-prefixed path must not expand to homedir", () => {
+		const result = expandPath("~/secret");
+		expect(result).not.toContain("/home");
+		expect(result).not.toContain("/Users");
+		// Should stay as literal path
+		expect(result).toBe("~/secret");
+	});
+});
+
+// ─── resolveToCwd: must stay within workspace ───────────────────────
+
+describe("resolveToCwd - workspace containment", () => {
+	test("resolves relative path within workspace", () => {
+		const result = resolveToCwd("hello.txt", workspace);
+		expect(result).toBe(join(workspace, "hello.txt"));
+	});
+
+	test("resolves '.' to workspace root", () => {
+		const result = resolveToCwd(".", workspace);
+		expect(result).toBe(workspace);
+	});
+});
+
+// ─── Tool-level containment (read tool) ─────────────────────────────
+
+describe("read tool - workspace containment", () => {
+	let readTool: any;
+
+	beforeAll(async () => {
+		const { createReadTool } = await import("../src/agent/tools/read.js");
+		readTool = createReadTool(workspace);
+	});
+
+	test("reads file inside workspace", async () => {
+		const result = await readTool.execute("id", { path: "hello.txt" });
+		expect(result.content[0].text).toBe("hello");
+	});
+
+	test("rejects traversal via ..", async () => {
+		await expect(readTool.execute("id", { path: "../../etc/passwd" })).rejects.toThrow(
+			/escapes workspace/i,
+		);
+	});
+
+	test("rejects absolute path outside workspace", async () => {
+		await expect(readTool.execute("id", { path: "/etc/passwd" })).rejects.toThrow(
+			/escapes workspace/i,
+		);
+	});
+
+	test("tilde path stays within workspace (no homedir expansion)", async () => {
+		// With tilde expansion removed, ~/foo resolves to <workspace>/~/foo
+		// which is safely inside the workspace. It will fail with ENOENT,
+		// NOT succeed in reading the user's homedir file.
+		await expect(readTool.execute("id", { path: "~/.bashrc" })).rejects.toThrow(/ENOENT/);
+	});
+});
+
+// ─── Tool-level containment (ls tool) ───────────────────────────────
+
+describe("ls tool - workspace containment", () => {
+	let lsTool: any;
+
+	beforeAll(async () => {
+		const { createLsTool } = await import("../src/agent/tools/ls.js");
+		lsTool = createLsTool(workspace);
+	});
+
+	test("lists workspace root", async () => {
+		const result = await lsTool.execute("id", {});
+		expect(result.content[0].text).toContain("hello.txt");
+	});
+
+	test("rejects traversal via ..", async () => {
+		await expect(lsTool.execute("id", { path: "../../" })).rejects.toThrow(
+			/escapes workspace/i,
+		);
+	});
+
+	test("rejects absolute path outside workspace", async () => {
+		await expect(lsTool.execute("id", { path: "/tmp" })).rejects.toThrow(
+			/escapes workspace/i,
+		);
+	});
+});
+
+// ─── Tool-level containment (grep tool) ─────────────────────────────
+
+describe("grep tool - workspace containment", () => {
+	let grepTool: any;
+
+	beforeAll(async () => {
+		const { createGrepTool } = await import("../src/agent/tools/grep.js");
+		grepTool = createGrepTool(workspace);
+	});
+
+	test("searches within workspace", async () => {
+		const result = await grepTool.execute("id", { pattern: "hello", literal: true });
+		expect(result.content[0].text).toContain("hello");
+	});
+
+	test("rejects traversal via ..", async () => {
+		await expect(
+			grepTool.execute("id", { pattern: "root", path: "../../etc" }),
+		).rejects.toThrow(/escapes workspace/i);
+	});
+
+	test("rejects absolute path outside workspace", async () => {
+		await expect(
+			grepTool.execute("id", { pattern: "root", path: "/etc" }),
+		).rejects.toThrow(/escapes workspace/i);
+	});
+});
+
+// ─── Tool-level containment (find tool) ─────────────────────────────
+
+describe("find tool - workspace containment", () => {
+	let findTool: any;
+
+	beforeAll(async () => {
+		const { createFindTool } = await import("../src/agent/tools/find.js");
+		findTool = createFindTool(workspace);
+	});
+
+	test("finds files in workspace", async () => {
+		const result = await findTool.execute("id", { pattern: "*.txt" });
+		expect(result.content[0].text).toContain("hello.txt");
+	});
+
+	test("rejects traversal via ..", async () => {
+		await expect(
+			findTool.execute("id", { pattern: "*", path: "../../" }),
+		).rejects.toThrow(/escapes workspace/i);
+	});
+
+	test("rejects absolute path outside workspace", async () => {
+		await expect(
+			findTool.execute("id", { pattern: "*", path: "/tmp" }),
+		).rejects.toThrow(/escapes workspace/i);
+	});
+});
+
+// ─── writeWorkspaceFile containment (Issue #4) ──────────────────────
+
+describe("writeWorkspaceFile - workspace containment", () => {
+	test("writes file inside workspace", async () => {
+		const result = await writeWorkspaceFile(workspace, "output.txt", "data");
+		expect(result.filePath).toBe(join(workspace, "output.txt"));
+	});
+
+	test("writes nested file inside workspace", async () => {
+		const result = await writeWorkspaceFile(workspace, "a/b/c.txt", "deep");
+		expect(result.filePath).toBe(join(workspace, "a", "b", "c.txt"));
+	});
+
+	test("rejects traversal via ..", async () => {
+		await expect(
+			writeWorkspaceFile(workspace, "../../../tmp/evil.txt", "pwned"),
+		).rejects.toThrow(/escapes workspace/i);
+	});
+
+	test("absolute path via join stays inside workspace", async () => {
+		// path.join(workspace, "/tmp/evil.txt") => "<workspace>/tmp/evil.txt"
+		// This is actually inside the workspace — join concatenates, doesn't replace.
+		const result = await writeWorkspaceFile(workspace, "/tmp/evil.txt", "safe");
+		expect(result.filePath).toBe(join(workspace, "tmp", "evil.txt"));
+	});
+
+	test("tilde path via join stays inside workspace", async () => {
+		// With no tilde expansion, ~/evil.txt joins as <workspace>/~/evil.txt
+		const result = await writeWorkspaceFile(workspace, "~/evil.txt", "safe");
+		expect(result.filePath).toBe(join(workspace, "~", "evil.txt"));
+	});
+});
+
+// ─── Symlink containment ────────────────────────────────────────────
+
+describe("symlink containment", () => {
+	let symlinkWorkspace: string;
+	let outsideDir: string;
+
+	beforeAll(async () => {
+		symlinkWorkspace = await mkdtemp(join(tmpdir(), "rumilo-symlink-test-"));
+		outsideDir = await mkdtemp(join(tmpdir(), "rumilo-outside-"));
+
+		// Create a regular file inside workspace
+		await writeFile(join(symlinkWorkspace, "legit.txt"), "safe content");
+
+		// Create a subdirectory inside workspace
+		await mkdir(join(symlinkWorkspace, "subdir"), { recursive: true });
+		await writeFile(join(symlinkWorkspace, "subdir", "inner.txt"), "inner content");
+
+		// Create a file outside workspace
+		await writeFile(join(outsideDir, "secret.txt"), "secret content");
+
+		// Symlink inside workspace pointing outside
+		symlinkSync(outsideDir, join(symlinkWorkspace, "escape-link"));
+
+		// Symlink inside workspace pointing to file outside
+		symlinkSync(join(outsideDir, "secret.txt"), join(symlinkWorkspace, "secret-link.txt"));
+
+		// Symlink inside workspace pointing to a file inside workspace (benign)
+		symlinkSync(join(symlinkWorkspace, "legit.txt"), join(symlinkWorkspace, "good-link.txt"));
+
+		// Nested symlink escape: subdir/nested-escape -> outsideDir
+		symlinkSync(outsideDir, join(symlinkWorkspace, "subdir", "nested-escape"));
+	});
+
+	afterAll(async () => {
+		await rm(symlinkWorkspace, { recursive: true, force: true });
+		await rm(outsideDir, { recursive: true, force: true });
+	});
+
+	test("rejects symlink directory pointing outside workspace", () => {
+		expect(() =>
+			ensureWorkspacePath(symlinkWorkspace, "escape-link/secret.txt"),
+		).toThrow(/escapes workspace via symlink/);
+	});
+
+	test("rejects symlink file pointing outside workspace", () => {
+		expect(() =>
+			ensureWorkspacePath(symlinkWorkspace, "secret-link.txt"),
+		).toThrow(/escapes workspace via symlink/);
+	});
+
+	test("allows symlink pointing within workspace", () => {
+		const result = ensureWorkspacePath(symlinkWorkspace, "good-link.txt");
+		expect(result).toBe(join(symlinkWorkspace, "good-link.txt"));
+	});
+
+	test("rejects nested symlink escape (subdir/nested-escape)", () => {
+		expect(() =>
+			ensureWorkspacePath(symlinkWorkspace, "subdir/nested-escape/secret.txt"),
+		).toThrow(/escapes workspace via symlink/);
+	});
+
+	test("rejects symlink escape via directory symlink alone", () => {
+		expect(() =>
+			ensureWorkspacePath(symlinkWorkspace, "escape-link"),
+		).toThrow(/escapes workspace via symlink/);
+	});
+
+	test("handles non-existent file in real directory (write target)", () => {
+		// File doesn't exist but parent is a real dir inside workspace — should pass
+		const result = ensureWorkspacePath(symlinkWorkspace, "subdir/new-file.txt");
+		expect(result).toBe(join(symlinkWorkspace, "subdir", "new-file.txt"));
+	});
+
+	test("rejects non-existent file under symlink-escaped parent", () => {
+		// Parent is a symlink pointing outside — even though target file doesn't exist
+		expect(() =>
+			ensureWorkspacePath(symlinkWorkspace, "escape-link/new-file.txt"),
+		).toThrow(/escapes workspace via symlink/);
+	});
+
+	test("writeWorkspaceFile rejects path through symlink escape", async () => {
+		await expect(
+			writeWorkspaceFile(symlinkWorkspace, "escape-link/evil.txt", "pwned"),
+		).rejects.toThrow(/escapes workspace via symlink/);
+	});
+
+	test("writeWorkspaceFile allows normal nested write", async () => {
+		const result = await writeWorkspaceFile(symlinkWorkspace, "new-dir/file.txt", "ok");
+		expect(result.filePath).toBe(join(symlinkWorkspace, "new-dir", "file.txt"));
+	});
+});

tsconfig.json 🔗

@@ -11,7 +11,7 @@
     "allowJs": false,
     "types": ["bun-types"],
     "outDir": "dist",
-    "rootDir": "src",
+    "rootDir": ".",
     "skipLibCheck": true
   },
   "include": ["src", "test"],