Browser Tools for Claude
This package provides a set of tools that allow Claude to control a headless Chrome browser from Go. The tools are built using the chromedp library.
Available Tools
browser_navigate- Navigate to a URL and wait for the page to loadbrowser_eval- Evaluate JavaScript in the browser contextbrowser_screenshot- Take a screenshot of the page or a specific element
Usage
// Create a context
ctx := context.Background()
// Register browser tools and get a cleanup function
tools, cleanup := browse.RegisterBrowserTools(ctx)
defer cleanup() // Important: always call cleanup to release browser resources
// Add tools to your agent
for _, tool := range tools {
agent.AddTool(tool)
}
Requirements
- Chrome or Chromium must be installed on the system
- In Docker environments, the multi-stage build automatically provides headless-shell from chromedp/headless-shell
- For local development, install Chrome/Chromium manually
- The
chromedppackage handles launching and controlling the browser
Tool Input/Output
All tools follow a standard JSON input/output format. For example:
Navigate Tool Input:
{
"url": "https://example.com"
}
Navigate Tool Output (success):
{
"status": "success"
}
Tool Output (error):
{
"status": "error",
"error": "Error message"
}
Example Tool Usage
// Example of using the navigate tool directly
navTool := tools[0] // Get browser_navigate tool
input := map[string]string{"url": "https://example.com"}
inputJSON, _ := json.Marshal(input)
// Call the tool
result, err := navTool.Run(ctx, json.RawMessage(inputJSON))
if err != nil {
log.Fatalf("Error: %v", err)
}
fmt.Println(result)
Screenshot Storage
The browser screenshot tool has been modified to save screenshots to a temporary directory and identify them by ID, rather than returning base64-encoded data directly. This improves efficiency by:
- Reducing token usage in LLM responses
- Avoiding encoding/decoding overhead
- Allowing for larger screenshots without message size limitations
How It Works
- When a screenshot is taken, it's saved to
/tmp/shelley-screenshots/with a unique UUID filename - The tool returns the screenshot ID in its response
- The web UI can fetch the screenshot using the
/api/read?path=...endpoint (with path set to the screenshot file)
Example Usage
Agent calls the screenshot tool:
{
"id": "tool_call_123",
"name": "browser_screenshot",
"params": {}
}
Tool response:
{
"id": "tool_call_123",
"result": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
The screenshot is then accessible at: /api/read?path=/tmp/shelley-screenshots/550e8400-e29b-41d4-a716-446655440000.png