Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.
A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.
We also keep a very close eye on the user impact. We monitor things like page load time, time to first token and payload size, with a focus on reducing the overhead of these protections. For the majority of people, the impact is negligible, and only a very small percentage may see a slight delay from extra checks. We also continuously evaluate precision so we can minimize false positives while still making abuse meaningfully harder.
It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.
Nick, I understand the practical realities regarding why you'd need to try to tamp down on some bot traffic, but do you see a world where users are not forced to choose between privacy and functionality?
I am not Nick, but there's a few ways that world happens: the free tier goes away and what people pay for more correctly reflects what they use, this all becomes cheap enough that it doesn't matter, or we come up with an end to end method of determining usage is triggered by a person.
Doesn’t really make sense, because any service can just say “you must paste your human-attestation JWT here to use this service” and plenty of people will.
Brilliant! Just the thing we want: more hardware attestation, more deanonymization, less user control, all diligently orchestrated in a repository where the only contributor is Anthropic Claude [0]. Comes complete with a misaligned ASCII diagram in the README to show how much effort the humans behind it put in!
Yes, even their "humanifesto" is LLM output, and is written almost exclusively in the "it's not X <emdash> it's Y" style.
Those are all situationally-valid criticisms, but I've long thought the ability to have smartphones' cameras cryptographically sign photos is good when available. The use case is demonstrating a photo wasn't doctored, and that it came from a device associated with e.g. a journalist, who maintains a public key. Of course, it should be optional.
>It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.
What are you talking about? It works fine with firefox with RFP and VPN enabled, which is already more paranoid than the average configuration. There are definitely sites where this configuration would get blocked, but chatgpt isn't one of them, so you're barking up the wrong tree here.
I love the containers too. My current use case is to keep my YouTube account separate from my Google one. Google doesn't need all that behavioural data in one place.
It's a pity Firefox doesn't get the praise it deserves half as much as it cops criticism.
The possibilities with Firefox multi containers and automation scripts as well are truly endless.
It's also possible to make Firefox route each container through a different proxy which could be running locally even which then can connect to multiple different VPN's. I haven't tried doing that but its certainly possible.
It's sort of possible to run different browsers with completely new identities and sometimes IP within the convenience of one. It's really underrated. I don't use the IP part of this that I have mentioned but I use multi containers quite a lot on zen and they are kind of core part of how I browse the web and there are many cool things which can be done/have been done with them.
In the good old days Netflix had "Dynamic HTML" code that would take a DOM element which scrolled out of view port and move it to the position where it was about to be scrolled in from the other end. Hence he number of DOM elements stayed constant no matter how far you scroll and the only thing that grows is the Y coordinate.
They did it because a lot of devices running Netflix (TVs, DVD players, etc) were underpowered and Netflix was not keen on writing separate applications. They did, however, invest into a browser engine that would have HW acceleration not just for video playback but also for moving DOM elements. Basically, sprites.
Yeah just had this earlier today, I had to write my response in vscode and paste it in, there were literal seconds of lag for typing each character. Typical bloated React.
Hi! It's all perfectly understandable - after all, we use things like Anubis to protect our services from OpenAI and similar actors and keep them available to the real users for exactly the same reasons.
Great to hear from a first-party source. I'm a Pro subscriber and my team spends well over two thousand dollars per month on OpenAI subscriptions. However, even when I'm logged in with my Pro account, if I'm using a VPN provider like Mullvad, I often have trouble using the chat interface or I get timeout errors.
Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.
Yep, on logged-in users too. The reason is basically the same: we want scarce compute going to real people, not attackers. Being logged in is one useful signal, but it doesn’t fully prevent automation, account abuse, or other malicious traffic, so we apply protections in both cases.
Nothing you do can fully prevent automation. Someone who wants to automate requests badly enough will be able to do it, especially when the “protections” are as easy to decrypt and analyze as the OP proved.
Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”
sometimes I paste giant texts (think summarization) in the chatgpt (paid) webapp and I noticed that the CPU fans spin up for about 5 seconds after, as if the text is "processed" client side somehow. this is before hitting "submit" to send the prompt to the model.
I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?
Can you fix the resizing text box issue on Safari when a new line is inserted? When your question wraps to a newline Safari locks up for a few seconds and it's really annoying. You can test by pasting text too.
I shouldn't be giving ideas to your boss, but I bet he would be interested in making ChatGPT available only by paying customers or free for those whose who gets their eyes scanned by The Orb. Give 30 days of raised limits and we're all set to live in the dystopia he wants.
I really can't tell for sure (new user posting a ridiculously hypocritical corporate message on a Sunday) but if GP actually works for OpenAI the lack of self-awareness is seriously striking
No, it doesn't go places we "do not want it to go". What part of zero knowledge doesn't make sense? How precisely does a free, unlinkable, multi-vendor, open-source cryptographic attestation of recent humanity create something terrible?
The part where FAANG does usual Embrace, Extend, Extinguish, masses don't care/understand and we have yet another "sign in with... " that isn't open source nor zero-knowledge in practice and monetizes your every move. And probably at least one of the vendors has massive leak that shows half-assed or even flawed on purpose implementation.
The ZK part isn't the problem. The "attestation of recent humanity" part is. Who attests? What happens when someone can't get attested?
You've been to the doctor recently, right? Given them your SSN? Every identity system ever built was going to be scoped || voluntary. None of them stayed that way.
Once you have the identity mechanism, "Oh it's zero knowledge! So let's use it for your age! Have you ever been convicted?" which leads to "mandated by employers" which leads to...
We've seen this goddamn movie before. Let's just skip it this time? Please?
It's absurd how unusable Cloudflare is making the web when using a browser or IP address they consider "suspicious". I've lately been drowning in captchas for the crime of using Firefox. All in the interest of "bot protection", of course.
The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding. VPN users, privacy-first browsers, uncommon IP ranges, they all get flagged. The people most likely to get caught by these systems are exactly the ones who care most about their privacy, and not the bots that they are apparently targeting.
So the stable state here is all humans eventually being locked out? (Bots are getting better every day; I doubt the same is true for all humans, including those with weird browsers or networks unwilling to install some dystopian Cloudflare "Internet passport".)
But hey, at least some bots are also not making it past Cloudflare!
Which VPNs are people using that actually care about the user's privacy? Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.
I have yet to see a use case for VPNs for the casual internet audience, and for a tech savvy user, their better off renting through some datacenter or something, which at that point is hardly a VPN and more home IP obfuscation. All the same downsides, and at least you get real privacy.
I'm forced to use a VPN to occasionally check my US bank account, since a foreign IP address is obviously a harbinger of unspeakable evil (while the friendly Youtube advertised neighborhood VPN is obviously evidence of pure intentions).
>Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.
Source? I haven't seen any evidence that the major paid VPN providers engage in any of those things. At best it's vague implications something shady is happening because one of the key people was previously at [shady organization].
ProtonVPN with bitcoin which you get from a monero swap is a good idea for complete privacy if you want port forwarding.
MullvadVPN is also another great one.
I have heard some good things about AirVPN, but I can absolutely attest for mullvad and to a degree ProtonVPN (Just with Proton, depending upon your threat model, do make the necessary precautions like buying with monero for example)
There are others, but mostly its the 2-3 that I trust.
Yes they’ve successfully made us regress back 20+ years in terms of wait times. Like reCaptcha and other user hostile stuff, I try not to frequent and certainly not buy anything from websites that treat me like this. I actually cancelled my chat GPT subscription almost two years ago now because they were making me solve puzzles instead of letting me use the software I paid for.
It’s worth remembering how replaceable basically all websites and software are and how little value they add. There’s no reason to reward or tolerate those that noticeably degrade performance with couldflare et al
Most people are on a CGNAT these days, drowning in captchas is the new normal. You’re at the mercy of one of your neighbors not hosting a botnet from their home computer.
For better or for worse, CF's fingerprinting and traffic filtering is a lot more in-depth than just IP trend analysis. Kind of by necessity, exactly because of what you mention. So I'd think that's not as big a worry per se.
Yet here I am drowning in captchas every once in a while, so it's quite a big worry for me.
Maybe I just have to disable all ad blockers and Safari tracking prevention? Or I guess I could send a link to a scan of my photo ID in a custom request header like X-Please-Cloudflare-May-I-Use-Your-Open-Web?
Not even remotely true, I genuinely have no idea what you're talking about. The only time I get captcha'ed is when I sometimes VPN around, or do some custom browser stuff and etc. I'll even say I get captcha'ed less now than maybe 5 years ago.
Every so often, usually after a firefox update, CF will get into a "I'm convinced your a bot" mode with me. I can get out of it by solving 20 CAPTCHAs.
It's probably just a higher rate of autonomous vehicles needing stop signs and buses identified at that moment, and cognitive bias causes you to only remember when that happens when you recently performed an update. /s
My assumption is that CF has something like a SVM that it's feeding a bunch of datapoints into for bot detection. Go over some threshold and you end up in the CAPTCHA jail.
I'm certain the User-Agent is part of it. I know that for certain because a very reliable way I can trigger the CF stuff is this plugin with the wrong browser selected [1].
>It's probably just a higher rate of autonomous vehicles needing stop signs and buses identified at that moment
I can't tell whether you're serious but in case you are, this theory immediately falls apart when you realize waymo operates at night but there aren't any night photos.
Surprising really, because I'm a Firefox + Ublock Origin die hard and I never get Cloudflare captchas. Wonder what the difference is? I have CGNAT turned off, if that matters at all (probably not).
I could definitely imagine a public IPv4 with lots of good, logged-in Cloudflare traffic to act as a positive signal for their heuristics, possibly even overriding the Firefox penalty.
I recently had the insane experience of filling out 15 consecutive captchas, after, I had checked out and entered my payment information into the payment processor widget. I just wanted to submit the order. I was logged in to their website, and the bank even needed a one time code for payment. If the bank is pretty sure I am human then your ecomm site can figure it out surely.
At least outside the US, there's 3DS as an (admittedly often high friction) high quality cardholder verification method, but in the US, that's of course considered much too consumer-hostile, so "select 87 overpasses" it is.
A while back I was buying tickets for a gondola for a trip in Europe and the checkout process failed during payment because their site didn't load their analytics/tracking stuff with proper error-handling, so when my ad-blocker prevented the tracking stuff, their checkout process failed to handle my CC's 2-factor auth and the checkout would fail. Had to contact my CC company and work with the gondola company to tell them what they're doing wrong so they could fix their website code. Pretty sad to know whoever built their stuff actually shipped a checkout flow (for a VERY popular tourist destination) without testing with ad-blockers enabled.
To be fair, this sometimes seems on the ad blocker. I've definitely seen mine accidentally nuke part of the payment Javascript (or maybe the 3DS iframe?) because some substring of it matched some common ad URL, which is obviously unrecoverable for the site itself.
In what way would that not be fair? Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.
No, but it's entirely within TSA's hands to make that process as frictionless as possible.
(It's a different question whether zero friction is actually desired, or whether some security theater is actually part of the service being provided, but that's a different question.)
The "quality" of TSA's screening seems be pretty bad too given how many people have to go through secondary screening vs how many terrorist they catch (0?)
they caught 11 million by now (just as arbitrary as your 0 but probably more accurate since we haven’t had a large terrorist attack since they got the gig to serve and protect and before we lost thousands of lives…)
No, using a stupid authentication/verification method with lots of false positives is always on whoever deploys it.
Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.
If it's just as easy to spoof being Chrome as it is to spoof being Firefox, then it is indeed fair to blame Cloudflare if they give Firefox users more CAPTCHAs than Chrome users.
I’ve been getting it in safari too. It’s ridiculous frankly. My residential ip must have been flagged or something. The part that’s really annoying is its trivial for bots to bypass.
I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.
Maybe that's the point? But then again, doesn't Cloudflare run part of it!? And wasn't there some "privacy-preserving captcha replacement" that iOS devices should already be opting me in to? So many questions, nobody there to answer them, because they can get away with it.
> The part that’s really annoying is its trivial for bots to bypass.
Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.
These days I just close sites that show that "checking if you're a bot" shit. If this is how the web is going to be now, I don't care, I'll just not use it. I didn't need to see that article or post that badly anyways. I'm tired of paying the price for the sociopathic, greedy actions of others. It's especially bad for anyone who uses an open source OS like Linux or *BSD (to the extent many sites just block me automatically with a 403 Forbidden simply for using OpenBSD + Firefox, completely free pass if I try the same site from a Windows or Linux computer).
We use Cloudflare to protect our content, but at the same time our machines mostly run Linux / Firefox so it really is quite a frustrating relationship. It really bums me out how much of Turnstile boils down to these two questions:
is it Linux (or similar)?
is it Firefox?
If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.
I'm building Safebox and Safecloud, where this won't be the case anymore. Not only will you have a decentralized hosting network that can sideload resources (e.g. via a browser extension that looks at your "integrity" attribute on websites) but also the websites will require you to be logged in with a HMAC-signed session ID (which means they don't need to do any I/O to reject your requests, and can do so quickly)... so the whole thing comes down to having a logged in account.
As far as server-to-server requests, they'll be coming from a growing network of cryptographically attested TPMs (Nitro in AWS, also available in GCP, IBM, Azure, Oracle etc.) so they'll just reject based on attestations also.
In short... the cryptographically attested web of trust will mean you won't need cloudflare. What you will need, however, to prevent sybil attacks, is age verification of accounts (e.g. Telegram ID is a proxy for that if you use Telegram for authentication).
But do they do it whether you're logged in or not?
I noticed the ChatGPT app also checks Play Integrity on Android (because GrapheneOS snitches on apps when they do this), probably for the same reason. Claude's app doesn't, by the way, but it also requires a login.
Coincidentally about an hour ago, I wanted to look something up in ChatGPT and I happened to be in a browser window I don’t normally use, with no logged in accounts. I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.
I used to mostly use chatgpt in an incognito tab, logged out. Until I notice it seems to have some context of my logged in session, and of the logged out as well. It may be paranoia or prompt deduction as well but that felt strange.
> These properties only exist if the ChatGPT React application has fully rendered and hydrated. A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them. A bot framework that stubs out browser APIs but doesn't actually run React won't have them.
> This is bot detection at the application layer, not the browser layer.
I kind of just assumed that all sophisticated bot-detectors and adblock-detectors do this? Is there something revealing about the finding that ChatGPT/CloudFlare's bot detector triggers on "javascript didn't execute"?
Perhaps the author should have made it clearer why we should care about any of this. OpenAI want you to use their real react app. That’s… ok? I skimmed the article looking for the punchline and there doesn’t seem to be one.
A bunch of the points in this AI generated blog post were like that. Makes me feel dirty when I'm 1/3rd of the way through and I realise how off it is.
I just don't understand why bot owners can't just run a complete windows 11 VM running Google Chrome complete with graphics acceleration.
You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.
There are myriad providers competing to offer this, nicely packaged with all the accoutrements (IP rotation, location spoofing, language settings, prebuilt parsers, etc.) behind an easy to use API.
Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.
I assume your concern with GPU passthrough is that each VM needs a whole GPU?
You can use GPU-PV to split your GPU between VM instances.
Then the main bottleneck becomes how thin you split out your VRAM.
Does anyone know how this is integrated on the Cloudflare side and across the app? Is this beyond standard turnstile? Is this custom/enterprise functionality? Something else?
If you have AI write a blog post for ya, when you think it's set, check word count (can c+p to google docs if AI can't pull it off with built in tools), and ask it to identify repetitions if it's over 1000.
Also, you can have it spotcheck colors: light orange on light background is unreadable, ask it to find the L*[1] of colors and dark/lighten as necessary if gap < 40 (that's minimum gap for yuge header text on background, 50 for text on background, these have gap of 25)
I haven't tried this yet, but, maybe have it count word count-per-header too. It's got 11 headers for 1000 words currently, makes reading feel really stacatto and you gotta evaluate "is this a real transition or vibetransition"
It doesn't look like it in the full sense of "free". But part of how one pays these services is by running a permissive modern browser which allows the corporation to spy on you even when you already paid in currency. In a sense by depriving them of the ability to easily spy on your this workaround is closer to "free".
>My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API
There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.
They see everything your doing because you send the text. But this is talking about everything about your computer system. You would not normally be sending this to them or having it involved at all. This workaround allows you to not involve unneeded information about your computer setup. It is not about avoiding sending prompt text.
And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.
I mean, I can easily get them to behaving defensively for not being abused. But MBP with M5 here, my chatgpt tab always get stucked when I hit some prompt.
Really really bad user experience, wondering about when they will leave this approach.
Why does ChatGPT slow down so much when the conversations get long, while Claude does compaction?
My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API –- when it should have been running quantized models on its own server.
My theory is that "AI" doesn't really have any long term paying customers and the majority of the "users" are people who have cooked up some clever hack to effectively siphon computing power from these providers in an effort to crank out the lowest effort ad supported slop imaginable.
Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.
Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.
A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.
We also keep a very close eye on the user impact. We monitor things like page load time, time to first token and payload size, with a focus on reducing the overhead of these protections. For the majority of people, the impact is negligible, and only a very small percentage may see a slight delay from extra checks. We also continuously evaluate precision so we can minimize false positives while still making abuse meaningfully harder.
It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.
Nick, I understand the practical realities regarding why you'd need to try to tamp down on some bot traffic, but do you see a world where users are not forced to choose between privacy and functionality?
I am not Nick, but there's a few ways that world happens: the free tier goes away and what people pay for more correctly reflects what they use, this all becomes cheap enough that it doesn't matter, or we come up with an end to end method of determining usage is triggered by a person.
Meet me in a cafe and I will sign a JWT saying you're not a bot. You can submit this to whoever will accept it.
Doesn’t really make sense, because any service can just say “you must paste your human-attestation JWT here to use this service” and plenty of people will.
If apple approves it, ive got a solution: A keyboardthat attests to your humanity https://typed.by/magicseth/2451#2NyGLfAQxmqRiAOTlaX7ma3G4d1o...
Brilliant! Just the thing we want: more hardware attestation, more deanonymization, less user control, all diligently orchestrated in a repository where the only contributor is Anthropic Claude [0]. Comes complete with a misaligned ASCII diagram in the README to show how much effort the humans behind it put in!
Yes, even their "humanifesto" is LLM output, and is written almost exclusively in the "it's not X <emdash> it's Y" style.
[0]: https://github.com/magicseth/keywitness/graphs/contributors
Those are all situationally-valid criticisms, but I've long thought the ability to have smartphones' cameras cryptographically sign photos is good when available. The use case is demonstrating a photo wasn't doctored, and that it came from a device associated with e.g. a journalist, who maintains a public key. Of course, it should be optional.
Somewhere there is someone 3D printing a keyboard cover that an llm can type with.
Sounds like we’re bringing back the PGP key signing parties
The sooner we do the better.
>It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.
What are you talking about? It works fine with firefox with RFP and VPN enabled, which is already more paranoid than the average configuration. There are definitely sites where this configuration would get blocked, but chatgpt isn't one of them, so you're barking up the wrong tree here.
Firefox multicontainers are pretty cool. But it’s an advanced process that most people wouldn’t do or do correctly.
I love the containers too. My current use case is to keep my YouTube account separate from my Google one. Google doesn't need all that behavioural data in one place.
It's a pity Firefox doesn't get the praise it deserves half as much as it cops criticism.
The possibilities with Firefox multi containers and automation scripts as well are truly endless.
It's also possible to make Firefox route each container through a different proxy which could be running locally even which then can connect to multiple different VPN's. I haven't tried doing that but its certainly possible.
It's sort of possible to run different browsers with completely new identities and sometimes IP within the convenience of one. It's really underrated. I don't use the IP part of this that I have mentioned but I use multi containers quite a lot on zen and they are kind of core part of how I browse the web and there are many cool things which can be done/have been done with them.
Don’t know if it’s related to the article, but the chats ui performance becomes absolutely horrendous in long chats.
Typing the chat box is slow, rendering lags and sometimes gets stuck altogether.
I have a research chat that I have to think twice before messaging because the performance is so bad.
Running on iPhone 16 safari, and MacBook Pro m3 chrome.
In the good old days Netflix had "Dynamic HTML" code that would take a DOM element which scrolled out of view port and move it to the position where it was about to be scrolled in from the other end. Hence he number of DOM elements stayed constant no matter how far you scroll and the only thing that grows is the Y coordinate.
They did it because a lot of devices running Netflix (TVs, DVD players, etc) were underpowered and Netflix was not keen on writing separate applications. They did, however, invest into a browser engine that would have HW acceleration not just for video playback but also for moving DOM elements. Basically, sprites.
The lost art of writing efficient code...
Same. It’s wild how bad it can get with just like a normal longer running conversation
Yeah just had this earlier today, I had to write my response in vscode and paste it in, there were literal seconds of lag for typing each character. Typical bloated React.
Just because a web application uses React and is slow, it does not follow that it is slow because of React.
It's perfectly possible to write fast or slow web applications in React, same as any other framework.
Linear is one of the snappiest web applications I've ever used, and it is written in React.
Hi! It's all perfectly understandable - after all, we use things like Anubis to protect our services from OpenAI and similar actors and keep them available to the real users for exactly the same reasons.
Great to hear from a first-party source. I'm a Pro subscriber and my team spends well over two thousand dollars per month on OpenAI subscriptions. However, even when I'm logged in with my Pro account, if I'm using a VPN provider like Mullvad, I often have trouble using the chat interface or I get timeout errors.
Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.
It's interesting to me that OpenAI considers scraping to be a form of abuse.
>These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.
Can you share these mitigations so we can mitigate against you?
It's just Cloudflare. Bypassing it is a whole industry.
Flaresolverr is one way. Isn’t perfect but bypasses a lot.
Can't have those bots or scrapers running amok can we...
But is the title true, is typing specifically blocked? Or does it just block submitting the text?
I ask because I have seen huge variations in load time. Sometimes I had to wait seconds until being able to type. Nowadays it seems better though.
> because we want to keep free and logged-out access
But don't you run these checks on logged-in users too?
Yep, on logged-in users too. The reason is basically the same: we want scarce compute going to real people, not attackers. Being logged in is one useful signal, but it doesn’t fully prevent automation, account abuse, or other malicious traffic, so we apply protections in both cases.
Nothing you do can fully prevent automation. Someone who wants to automate requests badly enough will be able to do it, especially when the “protections” are as easy to decrypt and analyze as the OP proved.
Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”
Yeah, that's it. Also, it is a bit amusing to me - "We want to prevent automation", says the employee of Let's Automate Inc.
Mad bro?
Y'all just salty that DeepSeek et al are training their LLMs on yours
sometimes I paste giant texts (think summarization) in the chatgpt (paid) webapp and I noticed that the CPU fans spin up for about 5 seconds after, as if the text is "processed" client side somehow. this is before hitting "submit" to send the prompt to the model.
I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?
Can you fix the resizing text box issue on Safari when a new line is inserted? When your question wraps to a newline Safari locks up for a few seconds and it's really annoying. You can test by pasting text too.
I shouldn't be giving ideas to your boss, but I bet he would be interested in making ChatGPT available only by paying customers or free for those whose who gets their eyes scanned by The Orb. Give 30 days of raised limits and we're all set to live in the dystopia he wants.
Tangential question: are there chatgpt app devs on X? There are a few from Codex team but I couldn’t find guys from “ordinary” chatgpt.
Also if you could pass this over: it takes 5 taps to change thinking effort on ios and none (as in completely hidden) on macos.
If I were to guess it seems that you were trying to lower the token usage :-). Why the effort is only nicely available on web and windows is beyond me
You’re doing gods work sir, thank you!
Have you given any thought to what we trade when big tech elects one corporation as the gatekeeper for vast swaths of the Internet?
"abuse like bots, scraping, fraud, and other attempts to misuse the platform"
This has to be a joke, right?
I really can't tell for sure (new user posting a ridiculously hypocritical corporate message on a Sunday) but if GP actually works for OpenAI the lack of self-awareness is seriously striking
How?
> OpenAI: These checks are part of how we protect products from abuse like bots, scraping, and other attempts to misuse the platform.
This would be fucking HILARIOUS if it wasn't so tragic.
Manifest destiny for me, border enforcement for thee.
It can be both
We really need ZKPs of humanity
No, we really don't. We don't need worldcoin, we don't need papers, please. We just don't.
"Prove your humanity/age/other properties" with this mechanism quickly goes places you do not want it to go.
No, it doesn't go places we "do not want it to go". What part of zero knowledge doesn't make sense? How precisely does a free, unlinkable, multi-vendor, open-source cryptographic attestation of recent humanity create something terrible?
The part where FAANG does usual Embrace, Extend, Extinguish, masses don't care/understand and we have yet another "sign in with... " that isn't open source nor zero-knowledge in practice and monetizes your every move. And probably at least one of the vendors has massive leak that shows half-assed or even flawed on purpose implementation.
The ZK part isn't the problem. The "attestation of recent humanity" part is. Who attests? What happens when someone can't get attested?
You've been to the doctor recently, right? Given them your SSN? Every identity system ever built was going to be scoped || voluntary. None of them stayed that way.
Once you have the identity mechanism, "Oh it's zero knowledge! So let's use it for your age! Have you ever been convicted?" which leads to "mandated by employers" which leads to...
We've seen this goddamn movie before. Let's just skip it this time? Please?
the irony of your statement is hilarious, disappointing, and infuriating.
It's absurd how unusable Cloudflare is making the web when using a browser or IP address they consider "suspicious". I've lately been drowning in captchas for the crime of using Firefox. All in the interest of "bot protection", of course.
The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding. VPN users, privacy-first browsers, uncommon IP ranges, they all get flagged. The people most likely to get caught by these systems are exactly the ones who care most about their privacy, and not the bots that they are apparently targeting.
>The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding.
That's... exactly expected? It's a cat and mouse game. People running botnets or AI scrapers aren't diligently setting the evil bit on their packets.
So the stable state here is all humans eventually being locked out? (Bots are getting better every day; I doubt the same is true for all humans, including those with weird browsers or networks unwilling to install some dystopian Cloudflare "Internet passport".)
But hey, at least some bots are also not making it past Cloudflare!
That’s obviously because they’re not being “evil”
Which VPNs are people using that actually care about the user's privacy? Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.
I have yet to see a use case for VPNs for the casual internet audience, and for a tech savvy user, their better off renting through some datacenter or something, which at that point is hardly a VPN and more home IP obfuscation. All the same downsides, and at least you get real privacy.
> Which VPNs are people using that actually care about the user's privacy?
Mullvad.
It has been proven in a court of law that when Mullvad says "no logging", they mean it.
They also regularly have security audits and publish the results[2][3]
[1]https://mullvad.net/en/blog/mullvad-vpn-was-subject-to-a-sea... [2]https://mullvad.net/en/blog/new-security-audit-of-account-an... [3]https://mullvad.net/en/blog/successful-security-assessment-o...
Second for Mullvad, I am quite distrusting in general but more I know about Mullvad, more I am convinced they really are serious about user privacy
I'm forced to use a VPN to occasionally check my US bank account, since a foreign IP address is obviously a harbinger of unspeakable evil (while the friendly Youtube advertised neighborhood VPN is obviously evidence of pure intentions).
Using any popular datacenter's IP range for a personal VPN is likely to be outright blocked.
Also you only get 1 IP so its not really anonymous and you definitely would have a fingerprint.
you just rotate it?
>Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.
Source? I haven't seen any evidence that the major paid VPN providers engage in any of those things. At best it's vague implications something shady is happening because one of the key people was previously at [shady organization].
ProtonVPN with bitcoin which you get from a monero swap is a good idea for complete privacy if you want port forwarding.
MullvadVPN is also another great one.
I have heard some good things about AirVPN, but I can absolutely attest for mullvad and to a degree ProtonVPN (Just with Proton, depending upon your threat model, do make the necessary precautions like buying with monero for example)
There are others, but mostly its the 2-3 that I trust.
Yes they’ve successfully made us regress back 20+ years in terms of wait times. Like reCaptcha and other user hostile stuff, I try not to frequent and certainly not buy anything from websites that treat me like this. I actually cancelled my chat GPT subscription almost two years ago now because they were making me solve puzzles instead of letting me use the software I paid for.
It’s worth remembering how replaceable basically all websites and software are and how little value they add. There’s no reason to reward or tolerate those that noticeably degrade performance with couldflare et al
Maybe check your network isn't sending web traffic you're not aware of?
I'm running firefox and seeing the normal amount.
Most people are on a CGNAT these days, drowning in captchas is the new normal. You’re at the mercy of one of your neighbors not hosting a botnet from their home computer.
For better or for worse, CF's fingerprinting and traffic filtering is a lot more in-depth than just IP trend analysis. Kind of by necessity, exactly because of what you mention. So I'd think that's not as big a worry per se.
Yet here I am drowning in captchas every once in a while, so it's quite a big worry for me.
Maybe I just have to disable all ad blockers and Safari tracking prevention? Or I guess I could send a link to a scan of my photo ID in a custom request header like X-Please-Cloudflare-May-I-Use-Your-Open-Web?
Not even remotely true, I genuinely have no idea what you're talking about. The only time I get captcha'ed is when I sometimes VPN around, or do some custom browser stuff and etc. I'll even say I get captcha'ed less now than maybe 5 years ago.
Every so often, usually after a firefox update, CF will get into a "I'm convinced your a bot" mode with me. I can get out of it by solving 20 CAPTCHAs.
It's probably just a higher rate of autonomous vehicles needing stop signs and buses identified at that moment, and cognitive bias causes you to only remember when that happens when you recently performed an update. /s
My assumption is that CF has something like a SVM that it's feeding a bunch of datapoints into for bot detection. Go over some threshold and you end up in the CAPTCHA jail.
I'm certain the User-Agent is part of it. I know that for certain because a very reliable way I can trigger the CF stuff is this plugin with the wrong browser selected [1].
[1] https://addons.mozilla.org/en-US/firefox/addon/uaswitcher/
>It's probably just a higher rate of autonomous vehicles needing stop signs and buses identified at that moment
I can't tell whether you're serious but in case you are, this theory immediately falls apart when you realize waymo operates at night but there aren't any night photos.
Thanks for the comment. Lack of seriousness is now appropriately indicated.
Maybe you allow tracking and cookies?
I don't, and I rarely have issues with firefox. Private + blockers + VPN causes, expected, issues but otherwise i'm usually fine?
I'm with a slightly older Firefox and can't use many websites at all anymore because the Cloudflare cancer.
Of course then you got sites like gnu.org too that block you because your slightly outdated user agent.
Surprising really, because I'm a Firefox + Ublock Origin die hard and I never get Cloudflare captchas. Wonder what the difference is? I have CGNAT turned off, if that matters at all (probably not).
I could definitely imagine a public IPv4 with lots of good, logged-in Cloudflare traffic to act as a positive signal for their heuristics, possibly even overriding the Firefox penalty.
I recently had the insane experience of filling out 15 consecutive captchas, after, I had checked out and entered my payment information into the payment processor widget. I just wanted to submit the order. I was logged in to their website, and the bank even needed a one time code for payment. If the bank is pretty sure I am human then your ecomm site can figure it out surely.
That's my favorite combination: Shitty bot detection meeting shitty payment security systems.
At least outside the US, there's 3DS as an (admittedly often high friction) high quality cardholder verification method, but in the US, that's of course considered much too consumer-hostile, so "select 87 overpasses" it is.
A while back I was buying tickets for a gondola for a trip in Europe and the checkout process failed during payment because their site didn't load their analytics/tracking stuff with proper error-handling, so when my ad-blocker prevented the tracking stuff, their checkout process failed to handle my CC's 2-factor auth and the checkout would fail. Had to contact my CC company and work with the gondola company to tell them what they're doing wrong so they could fix their website code. Pretty sad to know whoever built their stuff actually shipped a checkout flow (for a VERY popular tourist destination) without testing with ad-blockers enabled.
To be fair, this sometimes seems on the ad blocker. I've definitely seen mine accidentally nuke part of the payment Javascript (or maybe the 3DS iframe?) because some substring of it matched some common ad URL, which is obviously unrecoverable for the site itself.
Is that because botnets spoof being Firefox? It's not really fair to blame Cloudflare it is. That's on the bots.
In what way would that not be fair? Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.
>Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.
Is it TSA's "fault" that non-terrorists are subject to screening?
No, but it's entirely within TSA's hands to make that process as frictionless as possible.
(It's a different question whether zero friction is actually desired, or whether some security theater is actually part of the service being provided, but that's a different question.)
We're discussing the quality of screening here, not the act/necessity of screening itself.
>We're discussing the quality of screening here
The "quality" of TSA's screening seems be pretty bad too given how many people have to go through secondary screening vs how many terrorist they catch (0?)
they caught 11 million by now (just as arbitrary as your 0 but probably more accurate since we haven’t had a large terrorist attack since they got the gig to serve and protect and before we lost thousands of lives…)
They are failing to meet there quotas of shooting innocent people in the face, so ICE is helping out.
No, using a stupid authentication/verification method with lots of false positives is always on whoever deploys it.
Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.
If it's just as easy to spoof being Chrome as it is to spoof being Firefox, then it is indeed fair to blame Cloudflare if they give Firefox users more CAPTCHAs than Chrome users.
Not really, there's camoufox but the vast majority use modified chrome/chromium
I’ve been getting it in safari too. It’s ridiculous frankly. My residential ip must have been flagged or something. The part that’s really annoying is its trivial for bots to bypass.
> I’ve been getting it in safari too.
I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.
Maybe that's the point? But then again, doesn't Cloudflare run part of it!? And wasn't there some "privacy-preserving captcha replacement" that iOS devices should already be opting me in to? So many questions, nobody there to answer them, because they can get away with it.
> The part that’s really annoying is its trivial for bots to bypass.
Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.
Exactly. For the most part all this bot protection is only protecting these websites against humans.
I don't do free work. I'm not going to label 50 images of crosswalks and motorcycles for free.
> For the most part all this bot protection is only protecting these websites against humans.
Curious how do you know this?
These days I just close sites that show that "checking if you're a bot" shit. If this is how the web is going to be now, I don't care, I'll just not use it. I didn't need to see that article or post that badly anyways. I'm tired of paying the price for the sociopathic, greedy actions of others. It's especially bad for anyone who uses an open source OS like Linux or *BSD (to the extent many sites just block me automatically with a 403 Forbidden simply for using OpenBSD + Firefox, completely free pass if I try the same site from a Windows or Linux computer).
We use Cloudflare to protect our content, but at the same time our machines mostly run Linux / Firefox so it really is quite a frustrating relationship. It really bums me out how much of Turnstile boils down to these two questions:
is it Linux (or similar)?
is it Firefox?
If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.
Well, that's for the public internet.
I'm building Safebox and Safecloud, where this won't be the case anymore. Not only will you have a decentralized hosting network that can sideload resources (e.g. via a browser extension that looks at your "integrity" attribute on websites) but also the websites will require you to be logged in with a HMAC-signed session ID (which means they don't need to do any I/O to reject your requests, and can do so quickly)... so the whole thing comes down to having a logged in account.
https://github.com/Safebots/Safecloud
As far as server-to-server requests, they'll be coming from a growing network of cryptographically attested TPMs (Nitro in AWS, also available in GCP, IBM, Azure, Oracle etc.) so they'll just reject based on attestations also.
In short... the cryptographically attested web of trust will mean you won't need cloudflare. What you will need, however, to prevent sybil attacks, is age verification of accounts (e.g. Telegram ID is a proxy for that if you use Telegram for authentication).
Wow, if Seinfeld can have a soup nazi, I think it's within reason for you to be called the internet nazi.
"No s̶o̶u̶p̶ internet for you!"
Good luck!
This was sarcasm, right?
Presumably this is all because OpenAI offers free ChatGPT to logged out users and don't want that being abused as a free API endpoint.
Using 5.2 at 20 a month would also be a steal. Other shoe will drop on codex sooner or later
But do they do it whether you're logged in or not?
I noticed the ChatGPT app also checks Play Integrity on Android (because GrapheneOS snitches on apps when they do this), probably for the same reason. Claude's app doesn't, by the way, but it also requires a login.
Because accounts are free, and could still be used to abuse as a free endpoint, with a little trickiness.
Yup.
Coincidentally about an hour ago, I wanted to look something up in ChatGPT and I happened to be in a browser window I don’t normally use, with no logged in accounts. I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.
>I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.
They allowed anonymous requests for months now, maybe even a year.
I used to mostly use chatgpt in an incognito tab, logged out. Until I notice it seems to have some context of my logged in session, and of the logged out as well. It may be paranoia or prompt deduction as well but that felt strange.
Yeah it works but it's a dumber model. Prob mini
Its probably same for copilot.microsoft.com and their cloudfart usage
> These properties only exist if the ChatGPT React application has fully rendered and hydrated. A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them. A bot framework that stubs out browser APIs but doesn't actually run React won't have them.
> This is bot detection at the application layer, not the browser layer.
I kind of just assumed that all sophisticated bot-detectors and adblock-detectors do this? Is there something revealing about the finding that ChatGPT/CloudFlare's bot detector triggers on "javascript didn't execute"?
Perhaps the author should have made it clearer why we should care about any of this. OpenAI want you to use their real react app. That’s… ok? I skimmed the article looking for the punchline and there doesn’t seem to be one.
That's because the article is AI slop.
> A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them.
this is meaningless btw. A browser headless or not does execute javascript.
I disagree, a browser can have javascript execution disabled (and this is somewhat common in scraping to save time/resources).
I read it to mean: "A browser that doesn't execute the JavaScript bundle won't have [the rendered React elements]." Which is true.
A bunch of the points in this AI generated blog post were like that. Makes me feel dirty when I'm 1/3rd of the way through and I realise how off it is.
and chatgpt was then used to write this article. at least try to clean it up a bit
Ah yes, the timeless hallmark of web blogs: a draft so messy even a language model would ask for a second pass.
I just don't understand why bot owners can't just run a complete windows 11 VM running Google Chrome complete with graphics acceleration.
You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.
There are myriad providers competing to offer this, nicely packaged with all the accoutrements (IP rotation, location spoofing, language settings, prebuilt parsers, etc.) behind an easy to use API.
Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.
there are scraping subreddits.
if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.
you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM
If you know of a simple way to run a Windows 11 VM with good graphics acceleration (no GPU passthrough), please contact me.
I assume your concern with GPU passthrough is that each VM needs a whole GPU? You can use GPU-PV to split your GPU between VM instances. Then the main bottleneck becomes how thin you split out your VRAM.
More info here:
https://web.archive.org/web/20231107182321/https://mu0.cc/20...
https://youtu.be/XLLcc29EZ_8?t=570
https://github.com/jamesstringer90/Easy-GPU-PV
Does anyone know how this is integrated on the Cloudflare side and across the app? Is this beyond standard turnstile? Is this custom/enterprise functionality? Something else?
I imagine to stop web automation from getting free API like use of the model
If you have AI write a blog post for ya, when you think it's set, check word count (can c+p to google docs if AI can't pull it off with built in tools), and ask it to identify repetitions if it's over 1000.
Also, you can have it spotcheck colors: light orange on light background is unreadable, ask it to find the L*[1] of colors and dark/lighten as necessary if gap < 40 (that's minimum gap for yuge header text on background, 50 for text on background, these have gap of 25)
I haven't tried this yet, but, maybe have it count word count-per-header too. It's got 11 headers for 1000 words currently, makes reading feel really stacatto and you gotta evaluate "is this a real transition or vibetransition"
[1] L* as in L*a*b*, not L in Oklab
So are you able to get free inference now that you decrypted this?
It doesn't look like it in the full sense of "free". But part of how one pays these services is by running a permissive modern browser which allows the corporation to spy on you even when you already paid in currency. In a sense by depriving them of the ability to easily spy on your this workaround is closer to "free".
>My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API
There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.
They already see everything I’m doing because I send my prompts to them. What “workaround” are you referring to?
They see everything your doing because you send the text. But this is talking about everything about your computer system. You would not normally be sending this to them or having it involved at all. This workaround allows you to not involve unneeded information about your computer setup. It is not about avoiding sending prompt text.
And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.
But isn't ChatGPT access free through the browser? What do you mean already paid in currency?
I mean, I can easily get them to behaving defensively for not being abused. But MBP with M5 here, my chatgpt tab always get stucked when I hit some prompt.
Really really bad user experience, wondering about when they will leave this approach.
Why does ChatGPT slow down so much when the conversations get long, while Claude does compaction?
My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API –- when it should have been running quantized models on its own server.
AI-written article?
Yep. I flag these as spam at this point.
Ok... so... ?
Imagine if they'd put as much effort into making a decent frontend experience.
I am shocked openai collects data about it's users before users have the opportunity to send the same data to openai servers!
Another AI-slop article.
Sick.
ai slop analysis finding CF detects non javascript capable browsers with no punchline
My theory is that "AI" doesn't really have any long term paying customers and the majority of the "users" are people who have cooked up some clever hack to effectively siphon computing power from these providers in an effort to crank out the lowest effort ad supported slop imaginable.
Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.
What's the word? Schadenfreude?