This is one of the coolest hacks I've seen recently. Having done some much less involved MacOS hacking, I can't help but wonder if we may finally see momentum behind some flavor of agent-friendly Linux/Android if Apple doesn't give us more ways to let agents interact with our machines.
Thanks for starting that thread, I definitely drew some inspiration from it. But ultimately the secret sauce for the background click came from discovering yabai's window_manager_focus_window_without_raise https://github.com/asmvik/yabai/blob/f17ef88116b0d988b834bb2...
I tried out their Loom vm software a couple of months back. Worked well, fwiw. I'm not using it anymore because I decided to just give agents direct (supervised) access to my devices.
Thanks for trying out Lume! We definitely haven't given up on the idea of sandboxing GUI agents in local macOS VMs. Cua Driver is aimed at a different use case though, letting coding agents and general agents use the Mac you're already on, asynchronously and in the background. That also makes the economics better since multiple agents can share the same machine instead of each needing its own VM
Ex-Apple engineer here. I really like your implementation. A few years ago I built a similar tool to help me automate the testing of some of my native macOS apps. Being able to run multiple UI automation tests simultaneously was the big win in my case.
My only criticism is enabling telemetry by default. I'm a fan of having people opt-in.
The problem with opt-in telemetry is that 95% of users don't change defaults, and the 5% who do are your power users. They're not representative of the average user. And only a subset of them will turn it on
Ironically enough the opposite happens with opt-out telemetry, for the same reason: a lot of power users will turn off telemetry, thus you will never see their usage patterns and will have to infer them. Dogfooding helps.
Fair criticism. We took a similar approach to established dev tools like Homebrew, with an anonymous, opt-out telemetry to understand install issues, crashes, and high-level usage. For cua-driver specifically, telemetry is limited to command/tool-level events and basic environment metadata. We don’t send screenshots, recordings, app contents, prompts, typed text, file paths, or tool arguments. That said, we should make the opt-out path clearer
We don't have a specific testing framework yet. cua-driver is closer to an automation interface than a test runner. that said, you could definitely build one on top of it. For reference these are some of our integration tests:
https://github.com/trycua/cua/tree/main/libs/cua-driver/Test...
One useful trick is to cua-driver 'launch_app' instead of the default 'open' or other osascript, since it can start the app without raising/focusing it, and the tests don't disturb your active desktop while they run
The audit trail question is interesting and I haven't seen it come up much. When an agent clicks through an ERP or edits a file, you've got logs, but how do you explain the "why" behind each decision to, say, a compliance team?
Curious if that's something you're thinking about or if it's too early.
This is one of the coolest hacks I've seen recently. Having done some much less involved MacOS hacking, I can't help but wonder if we may finally see momentum behind some flavor of agent-friendly Linux/Android if Apple doesn't give us more ways to let agents interact with our machines.
Nice! Thanks for the technical writeup, ~2 weeks from me wondering how it's implemented [1] to being able to play with a replicated version!
[1] https://news.ycombinator.com/item?id=47799128
Thanks for starting that thread, I definitely drew some inspiration from it. But ultimately the secret sauce for the background click came from discovering yabai's window_manager_focus_window_without_raise https://github.com/asmvik/yabai/blob/f17ef88116b0d988b834bb2...
I tried out their Loom vm software a couple of months back. Worked well, fwiw. I'm not using it anymore because I decided to just give agents direct (supervised) access to my devices.
Thanks for trying out Lume! We definitely haven't given up on the idea of sandboxing GUI agents in local macOS VMs. Cua Driver is aimed at a different use case though, letting coding agents and general agents use the Mac you're already on, asynchronously and in the background. That also makes the economics better since multiple agents can share the same machine instead of each needing its own VM
Ex-Apple engineer here. I really like your implementation. A few years ago I built a similar tool to help me automate the testing of some of my native macOS apps. Being able to run multiple UI automation tests simultaneously was the big win in my case.
My only criticism is enabling telemetry by default. I'm a fan of having people opt-in.
The problem with opt-in telemetry is that 95% of users don't change defaults, and the 5% who do are your power users. They're not representative of the average user. And only a subset of them will turn it on
Ironically enough the opposite happens with opt-out telemetry, for the same reason: a lot of power users will turn off telemetry, thus you will never see their usage patterns and will have to infer them. Dogfooding helps.
I'm confused.
You claim power users opt in to telemetry, and then immediately say power users opt out.
The problem with opt-in telemetry is that 95% of users are sick and tired of being spied on with every little thing they do.
Fair criticism. We took a similar approach to established dev tools like Homebrew, with an anonymous, opt-out telemetry to understand install issues, crashes, and high-level usage. For cua-driver specifically, telemetry is limited to command/tool-level events and basic environment metadata. We don’t send screenshots, recordings, app contents, prompts, typed text, file paths, or tool arguments. That said, we should make the opt-out path clearer
Would you be open to sharing what you built for running the automation tests? I could really use this right now.
We don't have a specific testing framework yet. cua-driver is closer to an automation interface than a test runner. that said, you could definitely build one on top of it. For reference these are some of our integration tests: https://github.com/trycua/cua/tree/main/libs/cua-driver/Test...
One useful trick is to cua-driver 'launch_app' instead of the default 'open' or other osascript, since it can start the app without raising/focusing it, and the tests don't disturb your active desktop while they run
Its looking great.
The audit trail question is interesting and I haven't seen it come up much. When an agent clicks through an ERP or edits a file, you've got logs, but how do you explain the "why" behind each decision to, say, a compliance team?
Curious if that's something you're thinking about or if it's too early.