I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
To clarify - yt-dlp is a command line tool for downloading youtube videos, but it's in a constant arms race with the youtube website because they are constantly changing things in a way that blocks yt-dlp.
100% I'll response to this by Friday with link to Github.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
MCP is very much not dead. centralized remote MCP servers are incredibly useful. also bespoke CLIs still require guidance for models to use effectively, so it's clear that token efficiency is still an issue regardless.
I know it’s a bit of a tangent but man you’re right re. Gemini CLI. It’s woefully bad, barely works. Maybe because I was a “free” user trying it out at the time, but it was such a bad experience it turned me off subscribing to whatever their coding plan is called today.
it's not the CLI, it's the model. The model wasn't trained to do that kind of work, was trained to do one shot coding, not sustained back and forth until it gets it right like Claude and ChatGPT.
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
On one hand, cool demo, on the other, this is horrifying in more ways than I can begin to describe. You're literally one prompt injection away from someone having unlimited access to all of your everything.
Not the person you're replying to, but: I just use a separate, dedicated Chrome profile that isn't logged into anything except what I'm working on. Then I keep the persistence, but without commingling in a way that dramatically increases the risk.
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
Even without the bash escape risk (which can be mitigated with the various ways of only allowing yt-dlp to be executed), YT Music is a paid service gated behind a Google account, with associated payment method. Even just stealing the auth cookie is pretty serious in terms of damage it could do.
Agreed. I wouldn't cut loose an agent that's at risk of prompt injection w/ unscoped access to my primary Google account.
But if I understood the original commenter's use case, they're just searching YT Music to get the URL to a given song. This appears[0] to work fine without being logged in. So you could parameterize or wrap the call to yt-dlp and only have your cookie jar usable there.
Oh, that's true, even allows you to play without an account. I can swear that at some point it flat out refused any use unless you're logged in with an account that has YT Music (I remember having to go to regular YouTube to get the same song to send it to someone who didn't have it).
As long as it’s gated and not turned on by default, it’s all good. They could also add a warning/sanity check similar to “allow pasting” in the console.
Relying on warnings or opt-ins for something with this blast radius is security theater more than protection. The cleverest malware barely waits for you to click OK before making itself at home, so that checkbox is a speed bump on a highway.
Chrome's 'allow pasting' gets ignored reflexively by most users anyway. If this agent can touch DevTools the attack surface expands far faster than most people realize or will ever audit.
You get used to it :) And especially once you get used to the YOLO lifestyle, you end up realizing that practically any form of security is entirely worthless when you're dealing with a 200 IQ brainwashed robot hacker.
For now you are. All these things fall with time, of course. You will stop caring once you start feeling safe, we all do.
Also. AAarrgh, my new thing to be annoyed at is AI drivel written slop.
"No browser automation framework, no separate browser instance, no re-login."
Oh really, nice. No separate computer either? No separate power station, no house, no star wars? No something else we didn't ask for? Just one a toggle and you go? Whoaaaaaa.
Edit: lol even the skill itself is vibe coded:
Lightweight Chrome DevTools Protocol CLI. Connects directly via WebSocket — no Puppeteer, works with 100+ tabs, instant connection.
I feel like there's nothing fucking left on the internet anymore that is not some mean of whatever the LLM is trained to talk like now.
What can you do? I mentioned the use of AI on another thread, asking essentially the same question. The comment was flagged, presumably as off topic. Fair enough, I guess. But about 80% (maybe more) of posted blogs etc that I see on HN now have very obvious signs of AI. Comments do too. I hate it. If I want to see what Claude thinks I can ask it.
HN is becoming close to unusable, and this isn’t like the previous times where people say it’s like reddit or something. It is inundated with bot spam, it just happens the bot spam is sufficiently engaging and well-written that it is really hard to address.
To be clear, this isn't a skill for the devtools mcp, but an independent project. It doesn't look bad, but obviously browser automation + agents is a very busy space with lots of parallel efforts.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
It's probably not fully optimized and could be compacted more with just some effort, and further with clever techniques, but browser state/session data will always use up a ton of tokens because it's a ton of data. There's not really a way around that. AI's have a surprising "intuition" about problems that often help them guess at solutions based on insufficient information (and they guess correctly more often than I expect they should). But when their intuition isn't enough and you need to feed them the real logs/data...it's always gonna use a bunch of tokens.
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
Unfortunately there are like a billion competitors to this right now (including Playwright MCP, Playwright CLI, the new baked-in Playwright feature in Codex /experimental, Claude Code for Chrome...) and I can never quite decide if or when I should try to switch. I'm still just using the ordinary Playwright MCP server in both Codex and Claude Code, for the time being.
I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
Why even use Playwright for this? I feel like Claude just needs agent-browser and it can generate deterministic code from it.
Would this hypothetically be able to download arbitrary videos from youtube without the constant yt-dlp arms race?
Don’t know how this could be more stable than ytdlp. When issues come up they’re fixed really quickly.
> yt-dlp arms race
I don't know anything about yt-dlp.
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
To clarify - yt-dlp is a command line tool for downloading youtube videos, but it's in a constant arms race with the youtube website because they are constantly changing things in a way that blocks yt-dlp.
Yes, please do!
100% I'll response to this by Friday with link to Github.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
Another +1, it would be incredibly useful to play with this approach! (and fun)
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
MCP is very much not dead. centralized remote MCP servers are incredibly useful. also bespoke CLIs still require guidance for models to use effectively, so it's clear that token efficiency is still an issue regardless.
all you need is a simple skills.md and maybe a couple examples and codex picks up my custom toolkit and uses it.
I know it’s a bit of a tangent but man you’re right re. Gemini CLI. It’s woefully bad, barely works. Maybe because I was a “free” user trying it out at the time, but it was such a bad experience it turned me off subscribing to whatever their coding plan is called today.
it's not the CLI, it's the model. The model wasn't trained to do that kind of work, was trained to do one shot coding, not sustained back and forth until it gets it right like Claude and ChatGPT.
MCP is not just used for coding.
The DevTools MCP project just recently landed a standalone CLI: https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/m...
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
MCPs cost nothing in CC now with Tool Search.
Someone already made a great agent skill for this, which I'm using daily, and it's been very cool!
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
On one hand, cool demo, on the other, this is horrifying in more ways than I can begin to describe. You're literally one prompt injection away from someone having unlimited access to all of your everything.
Not the person you're replying to, but: I just use a separate, dedicated Chrome profile that isn't logged into anything except what I'm working on. Then I keep the persistence, but without commingling in a way that dramatically increases the risk.
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
Even without the bash escape risk (which can be mitigated with the various ways of only allowing yt-dlp to be executed), YT Music is a paid service gated behind a Google account, with associated payment method. Even just stealing the auth cookie is pretty serious in terms of damage it could do.
Agreed. I wouldn't cut loose an agent that's at risk of prompt injection w/ unscoped access to my primary Google account.
But if I understood the original commenter's use case, they're just searching YT Music to get the URL to a given song. This appears[0] to work fine without being logged in. So you could parameterize or wrap the call to yt-dlp and only have your cookie jar usable there.
[0]: https://music.youtube.com/search?q=sandstorm
[1]: https://music.youtube.com/watch?v=XjvkxXblpz8
Oh, that's true, even allows you to play without an account. I can swear that at some point it flat out refused any use unless you're logged in with an account that has YT Music (I remember having to go to regular YouTube to get the same song to send it to someone who didn't have it).
As long as it’s gated and not turned on by default, it’s all good. They could also add a warning/sanity check similar to “allow pasting” in the console.
Relying on warnings or opt-ins for something with this blast radius is security theater more than protection. The cleverest malware barely waits for you to click OK before making itself at home, so that checkbox is a speed bump on a highway.
Chrome's 'allow pasting' gets ignored reflexively by most users anyway. If this agent can touch DevTools the attack surface expands far faster than most people realize or will ever audit.
Of course I still watch it and have my finger on the escape key at all times :)
I am in awe of the confidence you have in your reflexes.
You get used to it :) And especially once you get used to the YOLO lifestyle, you end up realizing that practically any form of security is entirely worthless when you're dealing with a 200 IQ brainwashed robot hacker.
I think using the Pi coding agent really got me used to this way of thinking: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
For now you are. All these things fall with time, of course. You will stop caring once you start feeling safe, we all do.
Also. AAarrgh, my new thing to be annoyed at is AI drivel written slop.
"No browser automation framework, no separate browser instance, no re-login."
Oh really, nice. No separate computer either? No separate power station, no house, no star wars? No something else we didn't ask for? Just one a toggle and you go? Whoaaaaaa.
Edit: lol even the skill itself is vibe coded:
Lightweight Chrome DevTools Protocol CLI. Connects directly via WebSocket — no Puppeteer, works with 100+ tabs, instant connection.
I feel like there's nothing fucking left on the internet anymore that is not some mean of whatever the LLM is trained to talk like now.
What can you do? I mentioned the use of AI on another thread, asking essentially the same question. The comment was flagged, presumably as off topic. Fair enough, I guess. But about 80% (maybe more) of posted blogs etc that I see on HN now have very obvious signs of AI. Comments do too. I hate it. If I want to see what Claude thinks I can ask it.
HN is becoming close to unusable, and this isn’t like the previous times where people say it’s like reddit or something. It is inundated with bot spam, it just happens the bot spam is sufficiently engaging and well-written that it is really hard to address.
I hear you and I agree. I don't know. Gated communities?
To be clear, this isn't a skill for the devtools mcp, but an independent project. It doesn't look bad, but obviously browser automation + agents is a very busy space with lots of parallel efforts.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
It's probably not fully optimized and could be compacted more with just some effort, and further with clever techniques, but browser state/session data will always use up a ton of tokens because it's a ton of data. There's not really a way around that. AI's have a surprising "intuition" about problems that often help them guess at solutions based on insufficient information (and they guess correctly more often than I expect they should). But when their intuition isn't enough and you need to feed them the real logs/data...it's always gonna use a bunch of tokens.
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
Yes. CLI. Always CLI. Never MCP. Ever. You’re welcome.
That doesn't solve the issue here because the amount of data in the browser state dwarfs the MCP overhead.
I found this one working amazingly well (same idea - connect to existing session): https://github.com/remorses/playwriter
I found Firefox with https://github.com/padenot/firefox-devtools-mcp to work better then the default Chrome MCP, is seems much faster.
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
I suggest to use https://github.com/simonw/rodney instead
Unfortunately there are like a billion competitors to this right now (including Playwright MCP, Playwright CLI, the new baked-in Playwright feature in Codex /experimental, Claude Code for Chrome...) and I can never quite decide if or when I should try to switch. I'm still just using the ordinary Playwright MCP server in both Codex and Claude Code, for the time being.
I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Also works nicely together with agent-browser (https://github.com/vercel-labs/agent-browser) using --auto-connect
Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
[1]: https://chromedevtools.github.io/devtools-protocol/
Note that this is a mega token guzzler in case you’re paying for your own tokens!
I tell Claude to use playwright so I don't even need to do the setup myself.
Similarly, cursor has a built in browser and visit localhost to see the results in the browser. Although I don't use it much (I probably should).
I have been using Playwright for a fairly long time now. Do checkout
chrome-cli with remote developer port has been working fine this entire time.
Now that there's widespread direct connectivity between agents and browser sessions, are CAPTCHAs even relevant anymore?
Was already eye rolling about the headline. Then I realized it's from chrome.
Hoping from some good stories from open claw users that permanently run debug sessions.