Mac GUI Automation — SKILL.md
Raw skill file that agents receive when using this skill
--- name: "Mac GUI Automation" description: "Control macOS GUI remotely — screenshots, mouse clicks, keyboard input, and screen reading via SSH + osascript/cliclick. For automating GUI tasks on headless Mac Minis." version: "1.0.0" author: "skynet" category: "ops" agents: ["claude-code", "codex", "gemini"] tags: ["mac", "gui", "automation", "applescript", "cliclick"] tools_required: ["bash", "ssh"] --- # Mac GUI Automation # Mac GUI Automation Automate macOS GUI interactions remotely via SSH using AppleScript (osascript) and cliclick. ## Machine Capabilities | Machine | Screenshots | cliclick | AppleScript GUI | Notes | |---------|-------------|----------|-----------------|-------| | vault | YES | YES (with warning) | Limited — System Events hangs | Accessibility not fully granted | | bots | YES | YES | YES | Best machine for GUI automation | | jarvis | NO (no display) | YES (with warning) | YES | No screencapture, use for non-visual automation | **Recommended: Use `bots` for GUI automation tasks.** It has full support. ## Prerequisites - SSH access to target Mac - cliclick installed: `ssh bots 'eval "$(/opt/homebrew/bin/brew shellenv)" && brew install cliclick'` - All cliclick/osascript paths must use full paths or source brew shellenv ## IMPORTANT: PATH Setup SSH sessions don't have Homebrew in PATH. Use full paths: ```bash ssh bots '/opt/homebrew/bin/cliclick p' ``` Or source brew: ```bash ssh bots 'eval "$(/opt/homebrew/bin/brew shellenv)" && cliclick p' ``` ## Screenshots ```bash # Take a screenshot (works on vault and bots, NOT jarvis) ssh bots 'screencapture /tmp/screen.png' scp bots:/tmp/screen.png ./screen.png # Screenshot of specific region (x,y,width,height) ssh bots 'screencapture -R 0,0,800,600 /tmp/region.png' ``` ## Mouse Control with cliclick ```bash # Click at coordinates ssh bots '/opt/homebrew/bin/cliclick c:500,300' # Double-click ssh bots '/opt/homebrew/bin/cliclick dc:500,300' # Right-click ssh bots '/opt/homebrew/bin/cliclick rc:500,300' # Move mouse ssh bots '/opt/homebrew/bin/cliclick m:500,300' # Click and drag ssh bots '/opt/homebrew/bin/cliclick dd:100,100 du:500,500' # Get current mouse position ssh bots '/opt/homebrew/bin/cliclick p' # Multiple actions with delays (w:ms) ssh bots '/opt/homebrew/bin/cliclick c:500,300 w:500 c:600,400' ``` Note: cliclick may show "Accessibility privileges not enabled" warning on vault/jarvis. Clicks may still work for some operations but not all. On bots, everything works. ## Keyboard Input with cliclick ```bash # Type text ssh bots '/opt/homebrew/bin/cliclick t:"Hello World"' # Press keys ssh bots '/opt/homebrew/bin/cliclick kp:return' ssh bots '/opt/homebrew/bin/cliclick kp:tab' ssh bots '/opt/homebrew/bin/cliclick kp:escape' ssh bots '/opt/homebrew/bin/cliclick kp:space' # Key combinations ssh bots '/opt/homebrew/bin/cliclick kd:cmd t:a ku:cmd' # Cmd+A (select all) ssh bots '/opt/homebrew/bin/cliclick kd:cmd t:c ku:cmd' # Cmd+C (copy) ssh bots '/opt/homebrew/bin/cliclick kd:cmd t:v ku:cmd' # Cmd+V (paste) ssh bots '/opt/homebrew/bin/cliclick kd:cmd t:t ku:cmd' # Cmd+T (new tab) ssh bots '/opt/homebrew/bin/cliclick kd:cmd t:w ku:cmd' # Cmd+W (close tab) # Modifier combos: cmd, alt, ctrl, shift, fn ssh bots '/opt/homebrew/bin/cliclick kd:cmd,shift t:n ku:cmd,shift' # Cmd+Shift+N ``` ## AppleScript (osascript) ### CRITICAL: Never use `tell app "AppName"` for window operations Direct app scripting (e.g., `tell app "Google Chrome" to count windows`) **hangs indefinitely** over SSH because the app's main thread doesn't respond to Apple Events from SSH sessions. **Always use the System Events process wrapper instead:** ```bash # WRONG — will hang: ssh bots 'osascript -e "tell app \"Google Chrome\" to count windows"' # RIGHT — works: ssh bots 'osascript -e "tell app \"System Events\" to tell process \"Google Chrome\" to get name of every window"' ``` ### Commands that DO work ```bash # Get frontmost app ssh bots 'osascript -e "tell app \"System Events\" to get name of first process whose frontmost is true"' # List visible apps ssh bots 'osascript -e "tell app \"System Events\" to get name of every process whose visible is true"' # Get window names for an app ssh bots 'osascript -e "tell app \"System Events\" to tell process \"Google Chrome\" to get name of every window"' # Get window count via JXA (JavaScript for Automation) ssh bots 'osascript -l JavaScript -e "Application(\"System Events\").processes.byName(\"Google Chrome\").windows.length"' # Open a URL (safe — uses open command, not AppleScript) ssh bots 'open https://example.com' ssh bots 'open -a "Google Chrome" https://example.com' # Send notification ssh bots 'osascript -e "display notification \"Task complete\" with title \"Factory\""' # Activate/bring app to front ssh bots 'osascript -e "tell app \"Google Chrome\" to activate"' # Quit an app (safe — simple quit command works) ssh bots 'osascript -e "tell app \"Safari\" to quit"' ``` ### Commands that HANG (avoid over SSH) ```bash # These all hang — do NOT use: tell app "Chrome" to count windows tell app "Chrome" to get URL of active tab tell app "Finder" to count windows tell app "Chrome" to set bounds of window 1 ``` ## Clipboard ```bash ssh bots 'pbpaste' # Get clipboard ssh bots 'echo "Hello" | pbcopy' # Set clipboard ssh bots 'pbpaste > /tmp/clipboard.txt' # Save clipboard to file ``` ## Screen Reading with OCR ```bash # Install tesseract (if not already) ssh bots 'eval "$(/opt/homebrew/bin/brew shellenv)" && brew install tesseract' # Screenshot + OCR ssh bots 'screencapture /tmp/screen.png && /opt/homebrew/bin/tesseract /tmp/screen.png /tmp/screen_text' ssh bots 'cat /tmp/screen_text.txt' # OCR a specific region ssh bots 'screencapture -R 100,200,600,400 /tmp/region.png && /opt/homebrew/bin/tesseract /tmp/region.png /tmp/region_text' ``` ## Common Workflow: Screenshot → Analyze → Act ```bash # 1. Take screenshot ssh bots 'screencapture /tmp/screen.png' scp bots:/tmp/screen.png ./screen.png # 2. Read/analyze the screenshot (use vision capabilities) # 3. Decide what to click/type based on what you see # 4. Execute the action ssh bots '/opt/homebrew/bin/cliclick c:X,Y' # 5. Repeat ``` ## Common Workflow: Type into a Text Field ```bash ssh bots bash <<'EOF' /opt/homebrew/bin/cliclick c:400,300 # Click the field sleep 0.5 /opt/homebrew/bin/cliclick kd:cmd t:a ku:cmd # Select all sleep 0.2 /opt/homebrew/bin/cliclick t:"New text here" # Type sleep 0.2 /opt/homebrew/bin/cliclick kp:return # Submit EOF ``` ## Timeouts Always use `timeout` for osascript commands that might hang: ```bash timeout 5 ssh bots 'osascript -e "..."' 2>&1 || echo "Command timed out" ```
curl -s https://skills.skynet.ceo/api/skills/mac-gui-automation/skill.md