1# Browser Tools for Claude
2
3This package provides a set of tools that allow Claude to control a headless
4Chrome browser from Go. The tools are built using the
5[chromedp](https://github.com/chromedp/chromedp) library.
6
7## Available Tools
8
91. `browser_navigate` - Navigate to a URL and wait for the page to load
102. `browser_eval` - Evaluate JavaScript in the browser context
113. `browser_screenshot` - Take a screenshot of the page or a specific element
12
13## Usage
14
15```go
16// Create a context
17ctx := context.Background()
18
19// Register browser tools and get a cleanup function
20tools, cleanup := browse.RegisterBrowserTools(ctx)
21defer cleanup() // Important: always call cleanup to release browser resources
22
23// Add tools to your agent
24for _, tool := range tools {
25 agent.AddTool(tool)
26}
27```
28
29## Requirements
30
31- Chrome or Chromium must be installed on the system
32- In Docker environments, the multi-stage build automatically provides headless-shell from chromedp/headless-shell
33- For local development, install Chrome/Chromium manually
34- The `chromedp` package handles launching and controlling the browser
35
36## Tool Input/Output
37
38All tools follow a standard JSON input/output format. For example:
39
40**Navigate Tool Input:**
41```json
42{
43 "url": "https://example.com"
44}
45```
46
47**Navigate Tool Output (success):**
48```json
49{
50 "status": "success"
51}
52```
53
54**Tool Output (error):**
55```json
56{
57 "status": "error",
58 "error": "Error message"
59}
60```
61
62## Example Tool Usage
63
64```go
65// Example of using the navigate tool directly
66navTool := tools[0] // Get browser_navigate tool
67input := map[string]string{"url": "https://example.com"}
68inputJSON, _ := json.Marshal(input)
69
70// Call the tool
71result, err := navTool.Run(ctx, json.RawMessage(inputJSON))
72if err != nil {
73 log.Fatalf("Error: %v", err)
74}
75fmt.Println(result)
76```
77
78## Screenshot Storage
79
80The browser screenshot tool has been modified to save screenshots to a temporary directory and identify them by ID, rather than returning base64-encoded data directly. This improves efficiency by:
81
821. Reducing token usage in LLM responses
832. Avoiding encoding/decoding overhead
843. Allowing for larger screenshots without message size limitations
85
86### How It Works
87
881. When a screenshot is taken, it's saved to `/tmp/shelley-screenshots/` with a unique UUID filename
892. The tool returns the screenshot ID in its response
903. The web UI can fetch the screenshot using the `/api/read?path=...` endpoint (with path set to the screenshot file)
91
92### Example Usage
93
94Agent calls the screenshot tool:
95```json
96{
97 "id": "tool_call_123",
98 "name": "browser_screenshot",
99 "params": {}
100}
101```
102
103Tool response:
104```json
105{
106 "id": "tool_call_123",
107 "result": {
108 "id": "550e8400-e29b-41d4-a716-446655440000"
109 }
110}
111```
112
113The screenshot is then accessible at: `/api/read?path=/tmp/shelley-screenshots/550e8400-e29b-41d4-a716-446655440000.png`