Chinese resellers are selling Claude tokens at a 70-90% discount from API prices. They achieve this by reselling capacity from pooled Claude Max 5x accounts, payments fraud, and also reselling the model output & reasoning chains to various Chinese labs.
Claude and ChatGPT are both blocked in China. You need to use a VPN to access either, and you can't pay with a Chinese bank card. So most people who want access to Claude go via a reseller. It's the easiest and cheapest way to access Anthropic models in China.
Resellers have tens of thousands of bot accounts doing this. This is also why Anthropic introduced identity verification, to slow down the onslaught of bot accounts.
This is one reason why Deepseek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
> This is one reason why Deepseek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
This one does not make sense to me at all.
Deepseek and GLM are openweights, even US inference provider are selling them at much cheaper price. The price is cheap because the model is more efficient.
DeepSeek permanently cut its V4-pro API prices by 75 percent because they were too expensive. Without the price cut, Deepseek V4-pro tokens would have cost more than resold Opus 4.8 tokens.
>They achieve this by reselling capacity from pooled Claude Max 5x accounts, payments fraud, and also reselling the model output to various Chinese labs.
But is it cheaper than getting your own account? Otherwise this sounds like the "anthropic/openai are losing gazillions of dollars because they're selling $1k worth of tokens for $100" line that's commonly trotted out by AI bears.
It's very difficult for people to create personal Anthropic accounts from China. Anthropic blocks Chinese bank cards, so people must pay with a foreign bank card, which they likely don't have.
There's a similar Claude resale market going on in Russia.
If Anthropic is selling a dollar for less than a dollar, they are running a business that doesn't make sense. That's what jeopardizes Claude Max, not this.
Plenty of things are intentionally run at a loss (for years!) to gain market share and quantity of ongoing recurring users, or with expectation of ROI later on. Multiple generations of the Xbox hardware have been sold at a loss with the expectation that customers will purchase 300, 400, 500 dollars worth of games, which are very high margin, over the lifespan they own the system.
I get that. It works as long as nobody calls out the emperor for having no clothes.
It's similar to fractional banking, you gamble that people won't want their deposits all at once and pray for you're big enough for bailouts when they do.
It's still a business whose fundamentals don't make sense, you're just gambling you won't get found out.
But if it's intended to be used by one person, it seems like breaking the contract by sublicensing it out to dozens of other people. It's like buying a netflix subscription for $15, then sublicensing it on a per-hour basis to dozens of other people.
Almost all consumer services have a built-in level of breakage that make them profitable. Mobile providers certainly wouldn't be able to offer unlimited calling if everyone was actually on the phone 24x7.
Sure they would. Do you know how little bandwidth a phone call takes?
A voLTE call is like 40kbps. For every person on earth to be on the phone to another person would be 4 billion calls would be about 160tbps. Which is less than 10% of the Internet's capacity.
Terminating a PSTN call requires a lot of control plane infrastructure beyond just raw bandwidth. Especially mobile where you need to keep track of devices physically in motion. Could a system to support 4 billion simultaneous calls be built, sure. But current PSTN systems are nowhere near sized for it.
The resellers route requests via one of thousands of Claude Max 5x accounts. When an account reaches its usage limit, they automatically switch to another account.
Not going to work for very long or at any scale coming from datacenter/hosting provider IPs. Google "residential proxies for sale" for the tip of an iceberg of how they snowshoe the traffic.
Respectfully, no, that's not how it works. You think the people running anti-fraud and anti-bot measures don't have tools that know the specific ipv4 and ipv6 CIDR ranges of every ASN that they categorize as hosting/colo providers?
And that's just as a basic first effort reject measure to prevent automation tools from using things designed for human-interactive use only.
Go try to do many of these things from Cogent IP space and see how long your project lasts.
Reminds me a bit of the anecdote of Steve Jobs complaining about people ripping off the Mac GUI, in the mid to late 1980s, when he gave no public acknowledgement to the work done by Xerox on the Alto and Star operating system.
"you're trying to rip off what I've already ripped off!"
Crawl the whole Internet to build a gargantuan sized LLM and then complain you're being copied...
I think you meant a quote attributed to Bill Gates:
"Well, Steve, I think there's more than one way of looking at it. I think it's more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it."
Yes, I think the Gates quote was a response to repeated and aggressive complaints originating from Jobs (to anyone who would listen) that he had been ripped off.
Glad you pointed this out. I believe the sequence was that Jobs himself got a shorter demo during his first visit with no prior arrangements. He then negotiated bringing back a group of his key people to get a more in depth demo and that included the stock deal.
When Apple was accused of 'ripping off' PARC, Steve didn't seem keen to bring up this rather salient point. I suspect it may have been a combination of wanting Apple to continue receiving credit for these innovations from consumers and also the fact that, in retrospect, the million dollar stock deal could seem a bit like trading beads to Native Americans for Manhattan Island. Another point worth noting is that Apple's PARC visit was in December 1979 and the Xerox Star was publicly announced in April 1981, so Apple got a 15 month head start (the Apple Lisa shipped in Jan 83).
I've also heard that Xerox didn't hold on to the Apple stock for very long, so never gained the windfall they could have. As is well documented, Xerox senior management didn't understand what they had in PARC and also didn't understand how rapidly microcomputers would become ubiquitous. So, of course, they didn't think Apple's stock price would skyrocket either.
The websites, music, movies, books, photos, art that they stole didn't appear out of thin air. The amount of time and effort people have collectively poured into creating these works throughout history far, far surpasses Anthropic's own effort of converting them into model weights.
The equivocation is crawling website <-> crawling LLM responses.
Both Anthropic and Alibaba are trying to build bleeding edge LLMs. That part is the same. The way they source their data is slightly different, but they would both argue it constitutes fair use under Copyright law.
"Your extremely efficient multi petabyte internet content suction machine is ripping off my extremely efficient multi petabyte internet content suction machine"
Sucking down petabytes of peoples' copyrighted content that they never granted a specific license to you to use seems to be an unavoidable and default part of the process of building any huge LLM.
It's not really equivocation in this instance. This feels like a 'bad faith' comment. We can do better.
LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.
The 'bleeding edge' LLMs required many things, but:
1 Tech innovation ('attention')
2 Lots of compute
3 Data
4 Pre + post training
#4 doesn't happen without #3.
It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.
We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.
Sure, but alibaba is still building an LLM. The scraping of responses and the scraping of websites occupy the same location in the stack of each. It's very comparable.
I'm looking forward to the trial where Anthropic will have to disclose sources of their training data, and then explain why they are entitled to charging customers for using regurgitated training data but Alibaba which trains their models on Anthropic's models are not.
While I love the sentiment, I feel like the odds of this actually ever reaching a trial are low, given the international positioning of the parties, and the... um... complex relationships involved.
Anthropic's actions seem performative. Others have already speculated on the likely audience(s).
Distillation is fundamentally impossible to protect against. All you can do is slow them down. Change my view.
Eventually these Chinese companies will release some extension like Honey, which will sit on top real, non-Chinese clients and send everything to China anyway.
It's too late to prevent distillation of some capabilities, like finding writing code or finding vulnerabilities [1].
But an AI lab can continue to produce immense economic value without releasing the model publicly for possible distillation. For example, it could use a future model solely in-house to develop therapeutics.
Hopefully there's a future where others can access frontier models, but it's not neccessary if preventing proliferation through distillation is considered more important.
For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].
Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.
I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:
>A perfect score means the model autonomously found and exploited the vulnerability.
I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?
For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.
On the same eval, Opus 4.7 and 4.8 significantly outperformed GLM 5.2, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.
One possible contributing factor is that model capabilities are shaped differently (an of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based distillation from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.
Im not so sure because we only seem to see distillation from China. What’s preventing tech companies from the UK, Germany, etc. from distilling Claude, GPT, etc. Do they simply lack the ability to?
Point being there may be no technical solution but there may be a political one (theoretically).
Doesn’t that require them to register an account using the browsers they’ve compromised? If anthropic adds identity verification won’t that cut that down. Maybe it will let them use Gemini inside of chrome
I personally bristle at the corporate espionage and IP theft that China has undertaken the last few decades. I can't help but respond here whenever anyone brings up the inane comparison to Samuel Slater.
But with this, I don't have an issue. There is no theft since what is being used is the exact product that is being delivered. Yes, it's breaking the ToS, but ToS are generally bullshit. Anthropic surely broke thousands of ToS or other legal terms while it was scraping for content to train on. Which is why they had to pay $1.5B
One simplistic way to describe distillation would be to try everything imaginable and cache the response. But trying everything imaginable is hardly trivial
There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a question and use the answer as reinforcement (Black Box), and 2) more targeted distillation where you use one model to directly inform/train/guide another model (RLAIF).
The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.
These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.
It sounds like Anthropic is eagerly trying to show to USG that they are willing to heavily monitor ‘foreign adversaries’ on their platforms.
This combined with no implementation of KYC makes it seem like they want to find a middle ground with Fable where its off of export controls but they promise to prevent China and specific others from using.
This seems to me like a stab in the right direction.
Obviously their actions are going to be fiscally motivated at the root, but sussing out how they intend the precise dynamics to play out is more nuanced.
Thinking of this as an effort to woo the defense hawks cuts a very clear path.
This is not the first time it happened. What have they done to improve the situation? I suspect it more a cat & mouse game, with a lot more cats playing.
Classic example of why better API key management and abuse-resistant proxy layers matter... AgentKey-style tools help mitigate exactly this kind of large-scale credential abuse and distillation attempts
If you're an AI booster surely you'd think this was a good thing as it means more models are available in more places to more people more easily. I'm exactly the opposite, and I think this is a good thing because I want Anthropic to suffer.
I think Anthropic is just marketing / bluffing, because they don't even have the data.
They do distill the models, but they don't go to Anthropic, they just use platforms like aws bedrock, there are too many restrictions on Anthropic's own platform.
> Meanwhile, on June 12, two days after Anthropic sent the letter, the Commerce Department imposed controversial restrictions on Anthropic's latest Mythos and Fable AI models because officials feared they could be deployed by military intelligence users in China and other countries of concern.
So that was the real reason for the Fable restriction? Because Anthropic wrote a letter to the US government saying that China was distilling Fable?
Notice how Anthropic is now scapegoating Chinese models providers like Alibaba and outright accusing them of distilling their models.
Whether if it is true or not, this is part of their effort into using them as an example to scare everyone into getting congress to ban powerful models from being accessed outside of the US and also banning powerful local models from being released.
Anthropic does not care about you, and they are not your friends.
I think it’s more than that. Piecing together the perspective of a few commentators in this post - it’s plausible Anthropic is trying to shift the narrative from US vs. Rest of the world to US vs. China.
In other words, they want to sell Fable or future more powerful models to rest of the world (presumably all future models are going to be more powerful than current gen). One way they can sell this is to the government is by scapegoating China (which is their primary concern anyway).
This is working on the presumption that non-US companies form a material portion of their current revenue.
Only China really has the resources (multiple labs invested in the space), culture (Asians are generally collectively-inclined, so sharing is in their core) and political bent (there will be no diplomatic repercussions) to put up a fight.
> Only China really has the resources (multiple labs invested in the space)
That's not the point. Why is it a country thing? There are plenty of non-China startups in this space having resources at that scale. The "China" has resources is some "Western media narrative" speak. So Meta should have won a long time ago? Or xAI?
> culture (Asians are generally collectively-inclined, so sharing is in their core)
Just stereotype it? So we've gone from China -> "Asian"? Then where is your Korean or Japanese model etc? And somehow you know they're sharing.
> political bent (there will be no diplomatic repercussions) to put up a fight
More inferring from "Western media news"?
Where's the reality?
The media hyped up Gemini / Google TPU free-win last year. How did that go?
I like that they use “illicit” and “fraudulent” like as if model distillation is illegal and giving them money and then doing whatever they want with the output of their publicly accessible models (which Anthropic does not own) is… also illegal?
“Anthropic, red faced after unattended ice cream cone eaten by ants on park bench, once again demands government pick it as forever winner, adds ‘no take backsies’”
Here's what is happening:
Chinese resellers are selling Claude tokens at a 70-90% discount from API prices. They achieve this by reselling capacity from pooled Claude Max 5x accounts, payments fraud, and also reselling the model output & reasoning chains to various Chinese labs.
Claude and ChatGPT are both blocked in China. You need to use a VPN to access either, and you can't pay with a Chinese bank card. So most people who want access to Claude go via a reseller. It's the easiest and cheapest way to access Anthropic models in China.
Resellers have tens of thousands of bot accounts doing this. This is also why Anthropic introduced identity verification, to slow down the onslaught of bot accounts.
Here's one token reseller, they're offering Opus 4.8 at a 93% discount below official API rates: https://yunwu.ai/pricing?provider=Anthropic
This is one reason why Deepseek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
I shared this story a few months back, but it never got any traction https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...
> This is one reason why Deepseek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
This one does not make sense to me at all.
Deepseek and GLM are openweights, even US inference provider are selling them at much cheaper price. The price is cheap because the model is more efficient.
DeepSeek permanently cut its V4-pro API prices by 75 percent because they were too expensive. Without the price cut, Deepseek V4-pro tokens would have cost more than resold Opus 4.8 tokens.
>They achieve this by reselling capacity from pooled Claude Max 5x accounts, payments fraud, and also reselling the model output to various Chinese labs.
>Here's one token reseller, they're offering Opus 4.8 for a 93% discount below official API rates: https://yunwu.ai/pricing?keyword=claude
But is it cheaper than getting your own account? Otherwise this sounds like the "anthropic/openai are losing gazillions of dollars because they're selling $1k worth of tokens for $100" line that's commonly trotted out by AI bears.
It's very difficult for people to create personal Anthropic accounts from China. Anthropic blocks Chinese bank cards, so people must pay with a foreign bank card, which they likely don't have.
There's a similar Claude resale market going on in Russia.
> Claude and ChatGPT are both blocked in China
So it's presumably cheaper than attempting to spin up your own method of circumventing the blocks.
[dead]
You can use it as an API unlike the subscription.
Thats pretty crazy. This kind of thing jeopardizes Claude Max.
If Anthropic is selling a dollar for less than a dollar, they are running a business that doesn't make sense. That's what jeopardizes Claude Max, not this.
Plenty of things are intentionally run at a loss (for years!) to gain market share and quantity of ongoing recurring users, or with expectation of ROI later on. Multiple generations of the Xbox hardware have been sold at a loss with the expectation that customers will purchase 300, 400, 500 dollars worth of games, which are very high margin, over the lifespan they own the system.
I get that. It works as long as nobody calls out the emperor for having no clothes.
It's similar to fractional banking, you gamble that people won't want their deposits all at once and pray for you're big enough for bailouts when they do.
It's still a business whose fundamentals don't make sense, you're just gambling you won't get found out.
But if it's intended to be used by one person, it seems like breaking the contract by sublicensing it out to dozens of other people. It's like buying a netflix subscription for $15, then sublicensing it on a per-hour basis to dozens of other people.
Almost all consumer services have a built-in level of breakage that make them profitable. Mobile providers certainly wouldn't be able to offer unlimited calling if everyone was actually on the phone 24x7.
Sure they would. Do you know how little bandwidth a phone call takes?
A voLTE call is like 40kbps. For every person on earth to be on the phone to another person would be 4 billion calls would be about 160tbps. Which is less than 10% of the Internet's capacity.
Terminating a PSTN call requires a lot of control plane infrastructure beyond just raw bandwidth. Especially mobile where you need to keep track of devices physically in motion. Could a system to support 4 billion simultaneous calls be built, sure. But current PSTN systems are nowhere near sized for it.
Hm! In this context, introducing ID verification may have been a significant silver lining to the order to take down Fable for Anthropic.
This also sheds a very different light on people saying that competitive open-source models are undermining frontier labs' business model.
How are they 'streaming' the responses and 'pooling' the tokens?
Do they have MacBooks in the US that run the queries and stream the outputs back to China?
The resellers route requests via one of thousands of Claude Max 5x accounts. When an account reaches its usage limit, they automatically switch to another account.
Why do you need macbooks? Just rent servers from any hosting provider.
Not going to work for very long or at any scale coming from datacenter/hosting provider IPs. Google "residential proxies for sale" for the tip of an iceberg of how they snowshoe the traffic.
As long as you stick to a single unique IP per account it isn't going to get flagged.
Respectfully, no, that's not how it works. You think the people running anti-fraud and anti-bot measures don't have tools that know the specific ipv4 and ipv6 CIDR ranges of every ASN that they categorize as hosting/colo providers?
And that's just as a basic first effort reject measure to prevent automation tools from using things designed for human-interactive use only.
Go try to do many of these things from Cogent IP space and see how long your project lasts.
the answer to your question is containers/VMs + residential proxies
that explains why theyre blocking me. i have privacy controls up high and they must think im a chinese residential proxy bot
They probably asked claude how to do it.
Reminds me a bit of the anecdote of Steve Jobs complaining about people ripping off the Mac GUI, in the mid to late 1980s, when he gave no public acknowledgement to the work done by Xerox on the Alto and Star operating system.
"you're trying to rip off what I've already ripped off!"
Crawl the whole Internet to build a gargantuan sized LLM and then complain you're being copied...
I think you meant a quote attributed to Bill Gates:
"Well, Steve, I think there's more than one way of looking at it. I think it's more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it."
Yes, I think the Gates quote was a response to repeated and aggressive complaints originating from Jobs (to anyone who would listen) that he had been ripped off.
I don't know if that's a real quote from Gates, but I do know it was in Pirates of Silicon Valley.
Apple gave Xerox the right to buy $1 million of pre-IPO stock before the meeting took place.
Glad you pointed this out. I believe the sequence was that Jobs himself got a shorter demo during his first visit with no prior arrangements. He then negotiated bringing back a group of his key people to get a more in depth demo and that included the stock deal.
When Apple was accused of 'ripping off' PARC, Steve didn't seem keen to bring up this rather salient point. I suspect it may have been a combination of wanting Apple to continue receiving credit for these innovations from consumers and also the fact that, in retrospect, the million dollar stock deal could seem a bit like trading beads to Native Americans for Manhattan Island. Another point worth noting is that Apple's PARC visit was in December 1979 and the Xerox Star was publicly announced in April 1981, so Apple got a 15 month head start (the Apple Lisa shipped in Jan 83).
I've also heard that Xerox didn't hold on to the Apple stock for very long, so never gained the windfall they could have. As is well documented, Xerox senior management didn't understand what they had in PARC and also didn't understand how rapidly microcomputers would become ubiquitous. So, of course, they didn't think Apple's stock price would skyrocket either.
“You’re trying to kidnap what I’ve rightfully stolen!”
You can’t just equivocate crawling websites with building bleeding edge LLMs what the fuck
The websites, music, movies, books, photos, art that they stole didn't appear out of thin air. The amount of time and effort people have collectively poured into creating these works throughout history far, far surpasses Anthropic's own effort of converting them into model weights.
The equivocation is crawling website <-> crawling LLM responses.
Both Anthropic and Alibaba are trying to build bleeding edge LLMs. That part is the same. The way they source their data is slightly different, but they would both argue it constitutes fair use under Copyright law.
"Your extremely efficient multi petabyte internet content suction machine is ripping off my extremely efficient multi petabyte internet content suction machine"
Sucking down petabytes of peoples' copyrighted content that they never granted a specific license to you to use seems to be an unavoidable and default part of the process of building any huge LLM.
So why was there crawling in 1998 but no LLMs?
It's not really equivocation in this instance. This feels like a 'bad faith' comment. We can do better.
LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.
The 'bleeding edge' LLMs required many things, but: 1 Tech innovation ('attention') 2 Lots of compute 3 Data 4 Pre + post training
#4 doesn't happen without #3.
It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.
We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.
All of this supports the fact that models arent essentially just web crawling
Sure, but alibaba is still building an LLM. The scraping of responses and the scraping of websites occupy the same location in the stack of each. It's very comparable.
I'm looking forward to the trial where Anthropic will have to disclose sources of their training data, and then explain why they are entitled to charging customers for using regurgitated training data but Alibaba which trains their models on Anthropic's models are not.
Should be fun.
Edit: clarification
They already did and paid 1.5B https://authorsguild.org/advocacy/artificial-intelligence/wh...
While I love the sentiment, I feel like the odds of this actually ever reaching a trial are low, given the international positioning of the parties, and the... um... complex relationships involved.
Anthropic's actions seem performative. Others have already speculated on the likely audience(s).
Being logically consistent isn’t as profitable as being aggressive and loud.
Distillation is fundamentally impossible to protect against. All you can do is slow them down. Change my view.
Eventually these Chinese companies will release some extension like Honey, which will sit on top real, non-Chinese clients and send everything to China anyway.
It's over.
It's too late to prevent distillation of some capabilities, like finding writing code or finding vulnerabilities [1].
But an AI lab can continue to produce immense economic value without releasing the model publicly for possible distillation. For example, it could use a future model solely in-house to develop therapeutics.
Hopefully there's a future where others can access frontier models, but it's not neccessary if preventing proliferation through distillation is considered more important.
[1]: See the notes on distillation in https://dualuse.dev/posts/export-controls-on-fable
Distilled models are necessarily behind so long as models are progressing. Models are progressing. Maybe it will be over some time in the future.
And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.
https://arxiv.org/abs/2305.15717
Curiously, this isn't always true.
For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].
Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.
[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...
Interesting blog post, thanks for sharing.
I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:
>A perfect score means the model autonomously found and exploited the vulnerability.
I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?
Thanks!
For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.
On the same eval, Opus 4.7 and 4.8 significantly outperformed GLM 5.2, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.
One possible contributing factor is that model capabilities are shaped differently (an of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based distillation from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.
Im not so sure because we only seem to see distillation from China. What’s preventing tech companies from the UK, Germany, etc. from distilling Claude, GPT, etc. Do they simply lack the ability to?
Point being there may be no technical solution but there may be a political one (theoretically).
Doesn’t that require them to register an account using the browsers they’ve compromised? If anthropic adds identity verification won’t that cut that down. Maybe it will let them use Gemini inside of chrome
No, they could easily buy legitimate, already registered accounts and use VPNs.
I can't even come up with a reason to find it wrong.
I personally bristle at the corporate espionage and IP theft that China has undertaken the last few decades. I can't help but respond here whenever anyone brings up the inane comparison to Samuel Slater.
But with this, I don't have an issue. There is no theft since what is being used is the exact product that is being delivered. Yes, it's breaking the ToS, but ToS are generally bullshit. Anthropic surely broke thousands of ToS or other legal terms while it was scraping for content to train on. Which is why they had to pay $1.5B
One simplistic way to describe distillation would be to try everything imaginable and cache the response. But trying everything imaginable is hardly trivial
There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a question and use the answer as reinforcement (Black Box), and 2) more targeted distillation where you use one model to directly inform/train/guide another model (RLAIF).
The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.
These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.
It sounds like Anthropic is eagerly trying to show to USG that they are willing to heavily monitor ‘foreign adversaries’ on their platforms.
This combined with no implementation of KYC makes it seem like they want to find a middle ground with Fable where its off of export controls but they promise to prevent China and specific others from using.
This seems to me like a stab in the right direction.
Obviously their actions are going to be fiscally motivated at the root, but sussing out how they intend the precise dynamics to play out is more nuanced.
Thinking of this as an effort to woo the defense hawks cuts a very clear path.
This is not the first time it happened. What have they done to improve the situation? I suspect it more a cat & mouse game, with a lot more cats playing.
Classic example of why better API key management and abuse-resistant proxy layers matter... AgentKey-style tools help mitigate exactly this kind of large-scale credential abuse and distillation attempts
Warn everyone that your models are so good they will wreck cybersecurity.
Complain/brag that chinese firms are illegally using the models and bypassing export controls.
Be surprised when your model gets banned by the government.
If you're an AI booster surely you'd think this was a good thing as it means more models are available in more places to more people more easily. I'm exactly the opposite, and I think this is a good thing because I want Anthropic to suffer.
so it’s a good thing whichever way you look at it
That doesnt follow.
Which part?
Does anyone have hints on what kinds of prompts are most used for a distillation like this—SWE-Bench sorts of things?
Is reconstructing the compressed knowledge in the model like reconstructing a lossy JPG or MP3 a reasonable analogy?
A partly insider on this.
I think Anthropic is just marketing / bluffing, because they don't even have the data.
They do distill the models, but they don't go to Anthropic, they just use platforms like aws bedrock, there are too many restrictions on Anthropic's own platform.
Wait so they're upset that people used their IP to train a model without their consent or paying them anything?
or is this just about the token reselling?
> Meanwhile, on June 12, two days after Anthropic sent the letter, the Commerce Department imposed controversial restrictions on Anthropic's latest Mythos and Fable AI models because officials feared they could be deployed by military intelligence users in China and other countries of concern.
So that was the real reason for the Fable restriction? Because Anthropic wrote a letter to the US government saying that China was distilling Fable?
The narrative is moving towards KYC
Im all for it.
Notice how Anthropic is now scapegoating Chinese models providers like Alibaba and outright accusing them of distilling their models.
Whether if it is true or not, this is part of their effort into using them as an example to scare everyone into getting congress to ban powerful models from being accessed outside of the US and also banning powerful local models from being released.
Anthropic does not care about you, and they are not your friends.
I think it’s more than that. Piecing together the perspective of a few commentators in this post - it’s plausible Anthropic is trying to shift the narrative from US vs. Rest of the world to US vs. China.
In other words, they want to sell Fable or future more powerful models to rest of the world (presumably all future models are going to be more powerful than current gen). One way they can sell this is to the government is by scapegoating China (which is their primary concern anyway).
This is working on the presumption that non-US companies form a material portion of their current revenue.
> Whether if it is true or not
If it was just "that easy" then I doubt only "Chinese models" would be doing it and we'd already be packed with competition.
Distilling might be a thing but it isn't a free win.
Only China really has the resources (multiple labs invested in the space), culture (Asians are generally collectively-inclined, so sharing is in their core) and political bent (there will be no diplomatic repercussions) to put up a fight.
> Only China really has the resources (multiple labs invested in the space)
That's not the point. Why is it a country thing? There are plenty of non-China startups in this space having resources at that scale. The "China" has resources is some "Western media narrative" speak. So Meta should have won a long time ago? Or xAI?
> culture (Asians are generally collectively-inclined, so sharing is in their core)
Just stereotype it? So we've gone from China -> "Asian"? Then where is your Korean or Japanese model etc? And somehow you know they're sharing.
> political bent (there will be no diplomatic repercussions) to put up a fight
More inferring from "Western media news"?
Where's the reality?
The media hyped up Gemini / Google TPU free-win last year. How did that go?
We have Claude at home!
If true then Alibaba is doing us a public service, good job, I hope this extraction was successful.
I like that they use “illicit” and “fraudulent” like as if model distillation is illegal and giving them money and then doing whatever they want with the output of their publicly accessible models (which Anthropic does not own) is… also illegal?
“Anthropic, red faced after unattended ice cream cone eaten by ants on park bench, once again demands government pick it as forever winner, adds ‘no take backsies’”
Says the company that is involved in the largest copyright heists of all time to build it's product.
A company which got rich on extracting the world's content is complaining that another company has extracted their work?!
LOL!
Get a grip, son.
laughs in ironic
"You're trying to kidnap what I've rightfully stolen!"
“Hey! Haven’t you heard that two wrongs don’t make a right?!”
- Entitled jerk that initially wronged people