Vibe Engineering: Testing AI Browser Automation Tools with a Mock LLM Server
Building reliable browser automation for AI agents at vibebrowser.app requires rigorous testing. At Vibe Browser, we developed a…
Author: Dzianis Vashchuk | Site: Medium | Published: 2025-12-20T21:28:01Z
Vibe Engineering: Testing AI Browser Automation Tools with a Mock LLM Server Building reliable browser automation for AI agents at vibebrowser.app requires rigorous testing. At Vibe Browser, we …
Building reliable browser automation for AI agents at vibebrowser.app requires rigorous testing. At Vibe Browser, we developed a comprehensive testing infrastructure that validates our tools work correctly before they ever touch a real LLM API.The ToolsOur browser agent has access to these interaction tools:Tool Purposenavigate_to_urlNavigate to any URLclick_by_indexClick elements using indexed DOM referencesfill_by_indexFill input fields with textselect_by_indexSelect dropdown optionskeypressSend keyboard events (Enter, Escape, arrows)hover_elementTrigger hover states on elementsEach tool uses indexed elements rather than CSS selectors. When the agent sees page content, elements appear as [6:27] <input id="searchBox"/> where 6 is the index and 27 is a relevance score.The Mock LLM ServerInstead of calling OpenAI or other providers during tests, we spin up a local Express server that returns deterministic responses:app.post('/v1/chat/completions', (req, res) => { if (testState.phase === 'initial') { return respondWithToolCall('navigate_to_url', { url: 'http://localhost:3456/test-page' }); } // Phase 2: Execute all interaction tools if (testState.phase === 'navigated') { return respondWithToolCalls([ { name: 'click_by_index', args: { index: 0 } }, { name: 'fill_by_index', args: { index: 10, value: 'Test Value' } }, { name: 'keypress', args: { keys: 'Enter', index: 6 } }, { name: 'hover_element', args: { index: 7, duration: 1000 } } ]); }});The server maintains a state machine that progresses through test phases, ensuring tools are called in the right order.The Test PageWe serve a custom HTML page with interactive elements designed to verify each tool:<input type="text" id="keypressInput" placeholder="Press Enter here..."><div id="enterPressedDisplay">Waiting for Enter key</div><!-- Hover verification --><button id="hoverTestButton">Hello</button>JavaScript handlers update the DOM when events fire:keypressInput.addEventListener('keydown', (e) => { if (e.key === 'Enter') { enterDisplay.textContent = 'Enter is pressed'; }});hoverTestButton.addEventListener('mouseenter', () => { hoverTestButton.textContent = 'World';});Verification: DOM + OCRWe verify tool execution two ways:1. DOM Verification — Query the page directly:const values = await testPage.evaluate(() => ({ enterText: document.getElementById('enterPressedDisplay').textContent, buttonText: document.getElementById('hoverTestButton').textContent}));if (!values.enterText.includes('Enter is pressed')) { throw new Error('Keypress tool failed');}if (values.buttonText !== 'World') { throw new Error('Hover tool failed');}2. OCR Verification — Treat the browser as a black box:await verifyScreenshotContainsText( screenshot, ['Enter is pressed'], 'Keypress Tool (OCR)');await verifyScreenshotContainsText( screenshot, ['World'], 'Hover Tool (OCR)');OCR verification catches rendering issues that DOM checks miss.Running Testsnpm run test:extension# Output shows tool executionTool: keypress - Result: Sent keypress "Enter" to element [6]Tool: hover_element - Result: Hovered over element [7] for 1000ms# Verification passes✓ Keypress Tool (DOM): "Enter is pressed" verified✓ Hover Tool (DOM): Button text changed to "World"Why This MattersTesting with mock servers provides:Speed: No API latency, tests run in secondsDeterminism: Same inputs produce same outputsCost: Zero API charges during developmentIsolation: Test tool logic independent of LLM behaviorWhen all mock tests pass, we run the same scenarios against real LLMs in our eval suite. This two-tier approach catches bugs early while ensuring production readiness.Building browser automation tools requires treating testing as a first-class citizen. Mock servers and purpose-built test pages let us iterate quickly with confidenc