README.md

  1# Browser Tools for Claude
  2
  3This package provides a set of tools that allow Claude to control a headless
  4Chrome browser from Go. The tools are built using the
  5[chromedp](https://github.com/chromedp/chromedp) library.
  6
  7## Available Tools
  8
  91. `browser_navigate` - Navigate to a URL and wait for the page to load
 102. `browser_eval` - Evaluate JavaScript in the browser context
 113. `browser_screenshot` - Take a screenshot of the page or a specific element
 12
 13## Usage
 14
 15```go
 16// Create a context
 17ctx := context.Background()
 18
 19// Register browser tools and get a cleanup function
 20tools, cleanup := browse.RegisterBrowserTools(ctx)
 21defer cleanup() // Important: always call cleanup to release browser resources
 22
 23// Add tools to your agent
 24for _, tool := range tools {
 25    agent.AddTool(tool)
 26}
 27```
 28
 29## Requirements
 30
 31- Chrome or Chromium must be installed on the system
 32- In Docker environments, the multi-stage build automatically provides headless-shell from chromedp/headless-shell
 33- For local development, install Chrome/Chromium manually
 34- The `chromedp` package handles launching and controlling the browser
 35
 36## Tool Input/Output
 37
 38All tools follow a standard JSON input/output format. For example:
 39
 40**Navigate Tool Input:**
 41```json
 42{
 43  "url": "https://example.com"
 44}
 45```
 46
 47**Navigate Tool Output (success):**
 48```json
 49{
 50  "status": "success"
 51}
 52```
 53
 54**Tool Output (error):**
 55```json
 56{
 57  "status": "error",
 58  "error": "Error message"
 59}
 60```
 61
 62## Example Tool Usage
 63
 64```go
 65// Example of using the navigate tool directly
 66navTool := tools[0] // Get browser_navigate tool
 67input := map[string]string{"url": "https://example.com"}
 68inputJSON, _ := json.Marshal(input)
 69
 70// Call the tool
 71result, err := navTool.Run(ctx, json.RawMessage(inputJSON))
 72if err != nil {
 73    log.Fatalf("Error: %v", err)
 74}
 75fmt.Println(result)
 76```
 77
 78## Screenshot Storage
 79
 80The browser screenshot tool has been modified to save screenshots to a temporary directory and identify them by ID, rather than returning base64-encoded data directly. This improves efficiency by:
 81
 821. Reducing token usage in LLM responses
 832. Avoiding encoding/decoding overhead
 843. Allowing for larger screenshots without message size limitations
 85
 86### How It Works
 87
 881. When a screenshot is taken, it's saved to `/tmp/shelley-screenshots/` with a unique UUID filename
 892. The tool returns the screenshot ID in its response
 903. The web UI can fetch the screenshot using the `/api/read?path=...` endpoint (with path set to the screenshot file)
 91
 92### Example Usage
 93
 94Agent calls the screenshot tool:
 95```json
 96{
 97  "id": "tool_call_123",
 98  "name": "browser_screenshot",
 99  "params": {}
100}
101```
102
103Tool response:
104```json
105{
106  "id": "tool_call_123",
107  "result": {
108    "id": "550e8400-e29b-41d4-a716-446655440000"
109  }
110}
111```
112
113The screenshot is then accessible at: `/api/read?path=/tmp/shelley-screenshots/550e8400-e29b-41d4-a716-446655440000.png`