All notable changes to the superpowers-chrome MCP project.
- Viewport emulation (
set_viewport): device emulation with custom width/height/deviceScaleFactor and mobile mode (touch events + mobile UA string). Useful for responsive design testing and mobile screenshots clear_viewport: reset device emulation to browser defaultget_viewport: query current CSS viewport dimensions and devicePixelRatioclear_cookies: clear all browser cookies via CDPNetwork.clearBrowserCookies, without switching profiles- Full-page screenshots (
fullpage: true): capture entire scrollable page content beyond the visible viewport usingcaptureBeyondViewportandPage.getLayoutMetrics. Available in MCP ({"action": "screenshot", "payload": "full.png", "fullpage": true}), CLI (chrome-ws screenshot 0 full.png --fullpage), and lib chrome-ws pidcommand: prints Chrome PID for X11 window management (e.g.xdotool search --pid)chrome-ws infocommand: prints JSON with pid, port, mode, profile, profileDir, and running status; reads meta.json so it works across processesgetChromePid()exported from chrome-ws-libbrowser_modeaction now includespidfield
- HiDPI screenshot sizing on Linux: Viewport screenshots (no selector) now pass an explicit clip using
window.innerWidth/window.innerHeightwithscale: 1instead of relying on Chrome's internal DPI-scaled dimensions. All screenshots also passfromSurface: true. Fixes oversized screenshots on Linux displays with system DPI scaling (e.g. 1.5x/Xft.dpi:144)
- profileName path traversal:
setProfileName()now validates the profile name against[a-zA-Z0-9_-]+before writing meta.json, preventing path traversal via user-supplied profile names
- Dynamic CDP port allocation: Chrome no longer hardcodes port 9222.
startChrome()finds an available port in range 9222-12111, enabling multiple parallel Chrome instances without conflicts - Per-profile meta.json: Port assignment, PID, and headless mode are persisted at
~/.cache/superpowers/browser-profiles/{name}.meta.json. This enables:- Reconnection to Chrome instances that survive MCP restarts
- Collision detection when port is already in use
- Other sessions discovering which port a profile's Chrome is on
--port=Nflag: Both MCP server and CLI accept explicit port override- MCP:
node mcp/dist/index.js --port=9444 - CLI:
./chrome-ws start --port=9555
- MCP:
isPortAlive()/findAvailablePort()utilities: Exported from chrome-ws-lib for advanced usebrowser_modeaction: Now returnsportfield alongside headless/headed status
startChrome()signature: New optional third parameterportfor explicit port selection- Port priority: explicit
--portparam >CHROME_WS_PORTenv var > dynamic allocation showBrowser/hideBrowser: Now preserve the active port across Chrome restartskillChrome(): Clears meta.json so other sessions know the port is free
- Added
chromeHttpAt(host, port, path, method)for probing ports before settingactivePort - Module-level
activePortvariable replaces staticCHROME_DEBUG_PORTinchromeHttp() rewriteWsUrlcalls passactivePortfor correct WebSocket URL rewriting- Stale meta.json detected via PID liveness check (
process.kill(pid, 0)) and Chrome/json/versionprobe
- MCP tool description: Rewritten to clearly describe what each auto-captured file contains (viewport screenshot, structured markdown, full DOM, console messages) and guide Claude to prefer reading them over using extract or screenshot actions
- MCP help text: Aligned auto-capture section with new tool description language
- Browsing skill: Added Auto-Capture section explaining the capture system; updated screenshot action to indicate viewport screenshots are already auto-captured
- Enter key form submission: Added
text: '\r'to Enter keyDown events, enabling native form submission via\nin type payloads - XPath text() selector on mixed content: XPath selectors like
//a[text()='Settings']now fallback tonormalize-space()for elements with mixed content (e.g.,<a><svg/>Settings</a>) - README Quick Start path: Updated plugin installation path to include marketplace/version structure
- Auto-downscale screenshots: Screenshots exceeding 1800px are automatically downscaled using native tools (sips on macOS, ImageMagick on Linux) to prevent Claude API "2000px limit" errors in many-image mode
- Eval patterns in skill docs: Added documented patterns for viewport resize, cookie clearing, and scrolling via
evalaction - Test harness: Added
test-harness.jsfor automated React input testing (50+ iterations)
- Tab and Space keys: Added
textproperty to Tab (\t) and Space () key definitions for consistency
- Tab/Enter not working in type payloads: Fixed
\tand\nescape sequences in MCP payloads- MCP payloads contain literal backslash-t/n strings, not actual tab/newline characters
- Added preprocessing to convert
\t→ tab and\n→ newline before parsing - Now
type(selector, "field1\tfield2\n")correctly tabs between fields and submits
- Focus lost during screenshots: Screenshots were stealing focus, breaking Tab navigation
- Added
saveFocus()andrestoreFocus()helpers tocaptureActionWithDiff() - Saves focused element (by id, name, or DOM path) before screenshot
- Restores focus after screenshot so subsequent actions work correctly
- Added
- Type with selector losing focus: Changed
fill()to useel.focus()instead ofclick()- Click triggers
capturePageArtifacts()which takes a screenshot, losing focus - Using JS focus avoids the capture side effect
- Click triggers
fill()now preprocesses value with.replace(/\\t/g, '\t').replace(/\\n/g, '\n')captureActionWithDiff()wraps before-screenshot with focus save/restore- Focus identification uses id > name > DOM path fallback strategy
- Chrome crashes in containers: MCP server now auto-detects display availability
- Linux: Checks
DISPLAYorWAYLAND_DISPLAYenvironment variables - macOS: Checks
TERM_PROGRAMorDISPLAY - Windows: Assumes display available (headless servers rare)
- Falls back to headless mode when no display detected
- Linux: Checks
- Command-line overrides: Added
--headedflag to complement existing--headless--headless: Force headless mode--headed: Force headed mode (will fail if no display)- No flag: Auto-detect based on environment
- Improved logging: Startup message now shows mode and why it was chosen
- Example:
(headless mode, auto-detected no display)
- Example:
- Added
hasDisplay()function for cross-platform display detection - Previously Chrome defaulted to headed mode unless
--headlesswas passed, causing crashes in containers/CI
- Persistent Chrome profiles with "superpowers-chrome" default: Browser data now persists across sessions
- Default profile:
superpowers-chrome - Profile storage:
~/.cache/superpowers/browser-profiles/{profile-name}/ - Persists cookies, localStorage, extensions, auth sessions
- Profile management actions:
set_profile,get_profile - Optional profile parameter to
startChrome(headless, profileName) - Agent-specific profiles enable isolated browser states
- Default profile:
- Headless mode by default: Chrome now starts in headless mode for better performance and less desktop clutter
- Screenshots work perfectly in headless mode
- Faster startup and lower resource usage
- No browser windows cluttering the desktop
- Browser mode toggle: New actions to control headless/headed mode
show_browser: Switch to headed mode (visible browser window)hide_browser: Switch to headless mode (invisible browser)browser_mode: Check current mode status and active profile⚠️ WARNING: Toggling modes restarts Chrome and reloads pages via GET (loses POST state)
- XDG cache directory: Session files now stored in platform-appropriate cache locations
- macOS:
~/Library/Caches/superpowers/browser/YYYY-MM-DD/session-{timestamp}/ - Linux:
~/.cache/superpowers/browser/YYYY-MM-DD/session-{timestamp}/ - Windows:
%LOCALAPPDATA%/superpowers/browser/YYYY-MM-DD/session-{timestamp}/ - Respects
XDG_CACHE_HOMEenvironment variable on Linux - Date-based organization for easier cleanup
- macOS:
- browser-user agent: New read-only agent for browser automation tasks
- Pre-loaded with browsing skill
- Restricted to read-only tools (Read, Grep, Glob, Skill, use_browser)
- Cannot modify files or execute shell commands
- Has access to browser cache directory for viewing captured pages
- Chrome now defaults to headless mode instead of headed mode
- Chrome profiles now persist in XDG cache directory instead of temp directory
- Session directory structure uses XDG cache conventions
- Browser process management improved with proper PID tracking and graceful shutdown
browser_modeaction now returns profile information
- set_profile action: Fixed bug where
ensureChromeRunning()prevented profile changes by exempting profile/info actions from auto-start
- Added
chromeProcess,chromeHeadless,chromeUserDataDir,chromeProfileNamestate tracking - Implemented
killChrome(),showBrowser(),hideBrowser(),getBrowserMode()functions - Implemented
getChromeProfileDir(),getProfileName(),setProfileName()for profile management startChrome()now accepts optionalprofileNameparameter- Export
getXdgCacheHome()andgetChromeProfileDir()for external use - MCP server updated with five new actions: browser mode control + profile management
- Help text updated with browser mode and profile management documentation
- Comprehensive test suites:
test-headless-toggle.cjs- Validates headless mode switchingtest-profiles.cjs- Validates profile isolation and persistence
- Screenshot path confusion:
screenshotaction now returns absolute path instead of relative filename- Before:
Screenshot saved to solar_optimum.png(Claude can't find it) - After:
Screenshot saved to /Users/jesse/project/solar_optimum.png(Claude reads it directly)
- Before:
- Image content visibility: Markdown extraction now includes images with alt text and dimensions
- Adds prominent notice: "📷 This page contains N significant image(s). Check screenshot.png for visual content."
- Lists each image with description and size info
- Handles
<figure>elements with captions
- Directory auth spam: All capture files now go in single session directory
- Changed from subdirectories (
001-navigate-timestamp/page.md) to flat structure (001-navigate.md) - Only one directory permission prompt per session instead of per-page
- Files use prefixes:
001-navigate.html,001-navigate.md,001-navigate.png,001-navigate-console.txt
- Changed from subdirectories (
createCaptureDir()renamed tocreateCapturePrefix()- returns prefix string instead of creating subdirectory- Response format updated to show flat file structure with prefixes
- Help text updated to reflect new file naming convention
- CRITICAL: Restored all auto-capture and session management functionality that was accidentally removed
initializeSession(),cleanupSession(),createCaptureDir()- Session lifecycle managementclickWithCapture(),fillWithCapture(),selectOptionWithCapture(),evaluateWithCapture()- Auto-capture DOM actionsenableConsoleLogging(),getConsoleMessages(),clearConsoleMessages()- Console logging utilitiesgenerateDomSummary(),getPageSize(),generateMarkdown(),capturePageArtifacts()- Capture utilities- Session-based directory structure and time-ordered capture subdirectories
- 4-file capture format (page.html, page.md, screenshot.png, console-log.txt)
- Smart DOM summary system
- MCP server: Now starts correctly without
initializeSession is not a functionerror - Build system: Rebuilt bundle with all restored functionality
- Windows compatibility: Maintained host-override improvements from v1.5.0-1.5.1
CHROME_DEBUG_HOST,CHROME_DEBUG_PORT,rewriteWsUrl()integration preserved- Enhanced
getTabs()andnewTab()with WebSocket URL rewriting - Improved error handling for array responses
The v1.5.0-1.5.1 Windows support work accidentally removed ~466 lines of auto-capture code from chrome-ws-lib.js that was added in v1.4.0. This release restores all v1.4.0 functionality while preserving the Windows host-override improvements.
- Build system: Fixed outdated
mcp/dist/index.jsbundle causinginitializeSession is not a functionerror - Version sync: Aligned all package.json versions to 1.5.1
- CLAUDE.md: Comprehensive release engineering documentation
- Complete build system architecture
- Step-by-step release process
- Version management guidelines
- Marketplace distribution workflow
- Troubleshooting guide
- Development workflow best practices
- Build verification: Added clean build process to ensure fresh bundled output
- Documentation: Improved clarity on build dependencies and bundling process
-
MCP tool description: Added clear auto-capture messaging in tool description
- "DOM actions save page content to disk automatically - no extract needed"
- "AUTO-SAVE: Each DOM action saves page.html, page.md, screenshot.png to temp directory"
- Updated workflow examples to show auto-saved files instead of manual extract
-
Response format improvements: Made file availability crystal clear
- "Current URL:" shows exact page location
- "Output dir:" clearly indicates capture directory
- "Full webpage content: page.html, page.md" explicitly states complete page capture
- "Screenshot: screenshot.png" and "JS console: console-log.txt" clearly labeled
-
Enhanced action functions: All capture-enabled actions now include current URL in response
- Eliminates confusion: Claude clearly understands files are automatically available
- Reduces redundant calls: Prevents unnecessary extract actions after navigation
- Improves UX: Clear file categorization and location information
- Better workflows: Updated examples show proper auto-capture usage patterns
- NPX GitHub installation: Direct installation via
npx github:obra/superpowers-chrome - Headless mode support:
--headlessCLI flag for server environments - Root package.json: Enables proper NPX distribution from GitHub repository
- GitHub Issue #1: Installation problems resolved with NPX alternative
- GitHub Issue #4: Added progressive disclosure test automation guidance to skill
- GitHub PR #5: Merged bash shebang portability improvements
- Documentation clarity: Enhanced auto-capture guidance in help action
- Skill enhancements: Added collapsible test automation section with troubleshooting
- Session-based directory structure:
/tmp/chrome-session-{timestamp}/- Time-ordered capture subdirectories:
001-navigate-{timestamp}/,002-click-{timestamp}/, etc. - Automatic cleanup on MCP exit (SIGINT, SIGTERM, normal exit)
- Session initialization on first MCP use with
initializeSession() - Global session tracking with
sessionDirandcaptureCountervariables
- Time-ordered capture subdirectories:
- Navigate action enhancement: Added
autoCaptureparameter (enabled by default in MCP) - New capture-enabled action functions:
clickWithCapture(tabIndex, selector)- Click + immediate page capturefillWithCapture(tabIndex, selector, value)- Type + post-type state captureselectOptionWithCapture(tabIndex, selector, value)- Select + result captureevaluateWithCapture(tabIndex, expression)- JavaScript eval + state capture
- 4-file capture format per action:
page.html- Full rendered DOM usingdocument.documentElement.outerHTMLpage.md- Structured markdown extraction from page elementsscreenshot.png- Visual page state (renamed frompage.png)console-log.txt- Console message placeholder file
- Token-efficient DOM analysis (replaces verbose hierarchical approach)
- Interactive element counting: Buttons, inputs, links with readable formatting
- Structural analysis: Navigation, main content areas, forms detection
- Heading extraction: First 3 H1 elements with truncation indicators
- Bounded output: <25 tokens regardless of page complexity
- Console message storage: Per-tab message tracking with
consoleMessagesMap - Runtime domain integration: Console API event capture during navigation
- Utility functions:
enableConsoleLogging(),getConsoleMessages(),clearConsoleMessages() - Placeholder implementation: Framework ready for full console capture
- Help action: New
{"action": "help"}returns complete MCP documentation - Skill independence: MCP functions on systems without Claude Code skills
- Embedded guidance: All actions, parameters, examples, and troubleshooting included
- Auto-capture explanation: Documents the new capture system within the MCP
- Root package.json: Enables
npx github:obra/superpowers-chromeinstallation - Prepare script: Automatically builds MCP during NPX installation
- File distribution: Proper files array for NPX packaging
- Binary configuration: Correct bin path for direct execution
- CLI flag support:
--headlessflag for NPX MCP server - Enhanced startChrome(): Accepts headless parameter for server environments
- Auto-detection: Headless mode logged in server startup message
- CI/CD ready: Perfect for automated testing and server deployments
-
Navigate responses: Enhanced object return vs simple string
→ https://example.com (capture #001) Size: 1200×765 Snapshot: /tmp/chrome-session-123/001-navigate-456/ Resources: page.html, page.md, screenshot.png, console-log.txt DOM: Example Domain Interactive: 0 buttons, 0 inputs, 1 links Headings: "Example Domain" Layout: body -
DOM action responses: All now return detailed capture information
- Click:
"Clicked: selector"→ Rich capture response - Type:
"Typed into: selector"→ Rich capture response with typed value - Select:
"Selected: value"→ Rich capture response with selection details - Eval:
"[result]"→ Rich capture response with expression and result
- Click:
- navigate() function: Added
autoCaptureparameter and enhanced return object - Action routing in MCP: All DOM actions now use
*WithCapturevariants - Response formatting: New
formatActionResponse()function for consistent output - File naming: Standardized resource names across all captures
- Replaced hierarchical DOM tree with smart statistical summary
- Element counting approach:
document.querySelectorAll()for precise counts - Layout detection: Semantic element identification (nav, main, forms)
- Text formatting improvements: Quoted headings, readable spacing, truncation indicators
- Auto-capture clarity: Updated help to emphasize files are automatically saved
- Extract usage guidance: Clarified when extract is needed vs when files are already available
- Progressive disclosure: Added collapsible test automation section to skill
- Troubleshooting: Enhanced with JSON.stringify patterns and chrome-ws reference guidance
skills/browsing/chrome-ws-lib.js:
+ Session management functions (initializeSession, cleanupSession, createCaptureDir)
+ Console logging utilities (enableConsoleLogging, getConsoleMessages, clearConsoleMessages)
+ Enhanced DOM functions (generateDomSummary, getPageSize, generateMarkdown, capturePageArtifacts)
+ Capture-enabled action wrappers (clickWithCapture, fillWithCapture, selectOptionWithCapture, evaluateWithCapture)
* Modified navigate() function with autoCapture parameter
mcp/src/index.ts:
+ formatActionResponse() function for consistent response formatting
+ Enhanced navigate action with rich response handling
* Modified click, type, select, eval actions to use capture variants
+ Session initialization in main() function
- Cleanup handlers: Registered for
exit,SIGINT,SIGTERMevents - Session persistence: Directory maintained throughout MCP lifetime
- Capture sequencing: Incremental numbering for temporal ordering
- Error recovery: Auto-capture failures don't prevent action success
- Dual domain enablement: Page + Runtime domains for navigation with auto-capture
- Message handling: Enhanced WebSocket message processing for console events
- Timing coordination: 1-second delay after page load for console message capture
- Parallel processing: Simultaneous HTML, markdown, screenshot, and DOM summary generation
- Original functions preserved:
click(),fill(),selectOption(),evaluate()unchanged - MCP tool interface: No changes to external tool parameters or descriptions
- Graceful degradation: Auto-capture failures return basic success responses
- Module exports: All existing exports maintained, new functions added
- Token efficiency: 95% reduction in DOM summary token usage
- Parallel capture: Simultaneous file generation for faster response times
- Memory management: Session cleanup prevents directory accumulation
- Bounded operations: DOM summary algorithm has fixed computational complexity
- Rich context: Comprehensive page state after every DOM-changing action
- Visual debugging: Screenshots show immediate action results
- Structured analysis: Markdown format enables content analysis
- Temporal tracking: Numbered captures show interaction progression
- Token preservation: Smart DOM summary prevents large page token explosion
- Organized workflow: Session-based storage for complex automation sequences
- XPath selector support alongside CSS selectors
- Improved tool clarity with examples
- Auto-tab creation when none exist
- Enhanced payload parameter documentation with action-specific details
- Improved error handling for out-of-range tab indices
- Tab index validation and error messages