Where multiple locations are listed for this role, the position may be based in any of those locations, with priority determined according to the order of listing.
We're looking for an engineer to work on the control layer - the system that translates an AI model's intent into precise, reliable actions on a real computer. This means mouse movements, keyboard input, window management, UI element detection, and error recovery across macOS, Windows, and Linux.
What you'll do
Work on the low-level computer control stack: mouse/keyboard injection, screen capture, coordinate mapping, input simulation
Implement UI element detection using accessibility APIs (AXUIElement, UI Automation), DOM/a11y trees, and visual grounding
Help build the abstraction layer that lets our agent operate across OS platforms and application types
Tackle reliability problems: element targeting under UI changes, window occlusion, resolution scaling, cross-app focus management
Contribute to feedback loops: how does the agent know its action worked? How does it recover when something unexpected happens?
Work closely with the model and planning team on the interface between intent and execution
You might be a fit if
You've built OS-level input automation (CGEvent, SendInput, xdotool, or similar)
You understand accessibility frameworks - AXUIElement on macOS, UI Automation on Windows, AT-SPI on Linux
You've dealt with flaky element selectors, timing issues, resolution-dependent coordinates
You think carefully about reliability and edge cases
You've worked with tools like Playwright, Appium, PyAutoGUI, Hammerspoon, or similar
Bonus
Experience with screen reader internals, remote desktop protocols (RDP/VNC), game automation, LLM agent tool-use systems, or mobile device automation (iOS UIAutomation / XCTest, Android UIAutomator / Accessibility).
simular