I never want to hear from developers again that they are not susceptible to marketing. I see meet ups specifically about Claude often.
Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Ah that's always SO fun. It doesn't matter how "smart" the person actually are (or think they are) we are ALL susceptible to influence and blind tests are shockingly simple to implement.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.
Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
I'd bet I could tell with a result somewhat better than random chance.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
This is like saying you gave a Taylor Swift fan sheet music from 1984 and from Michael Jackson’s thriller and they couldn’t tell the difference.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
The creative output and time to direct, to deliver due to the flow will also be different.
And it really depends on the task. Is it a typical well defined bug, or is it simpel CRUD. Or does it require research, combining different sources of data in a complex and creative ways.
This is also why benches never show reality, and the only real understanding comes if you actually try to build something.
That's a weird way to look at it. Any car gets you to your destination, but some people prefer driving a sports car or an SUV. They get something out of it that isn't just a marketing delusion, but subjective joy from the interaction with one product over another.
Luxury cars are indeed a good comparison. The subjective joy is a result of the delusion. That is why so much money is spent on such marketing to begin with. The analogous comparison would be if a blindfolded passenger turned out to prefer the Sienna to the 911.
I would actually say it is a luxury car where you have your personal driver and you are free to work on other tasks, and it gets you faster to the destination. Time to me is at least the most valuable thing.
I think for developers the distinction is that ChatGPT is this commercial all in one solution for normies and Claude is specific for developers, in reality as you say the results for normal developers is indistinguishable.
Pretty easy to tell depending what the code is. GPT follows this pattern is using maybe_something and using uppercase constants by default. Claude is a little more natural but tends to include more fallbacks than gpt5.5
It's crazy hearing devs on this site claim Claude is 10x better than all other AI solutions. I think it is fomo. Claude $LATEST_VERSION is perceived as the best and anything else is "missing out". New version comes out? Suddenly the old version is worthless, how on earth did anyone get work done with that?
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
I’ve been using DeepSeek V4 in OpenCode exclusively for about a month.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
I don't think it's marketing, for quite a long time Claude was clearly better and not everyone has adapted to the new reality where they have similar capabilities.
I use both, enough to reach Codex highest personal sub limits and Claude is stronger to me specifically because of how the flow of building feels. So the PR for any random task would be irrelevant to me.
You're comparing apples to oranges. Claude is a frontend overall product name, GPT5.5 is a specific model. Which model within Claude's offerings are you referring to? Opus 4.7, Sonnet 4.6, or something else?
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
I recently tried with C# code and Avalonia on Linux. Total disaster. Could only get things to run after 10 attempts or so, and was only trying a very basic example. For some of the experiments I actually gave up.
Claude has an "End Conversation" tool that it can trigger on it's own, forcing your interaction to a close based on it's own feelings towards the conversation.
I have no idea how this wasn't the end of Anthropic's positive public perception.
I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8
GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
The roulette pockets for the model are bigger for some outputs than others. Draw a big enough black box around it and a different one around humans and it's insistinguishable.
Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.
Funny cause I'm quite literally having this exact issue with 4.8 as we speak. I've been going back and forth with Claude since yesterday afternoon on chopping up, stabilizing and facilitating recovery on a flaky mega-pipeline. Not 5 minutes ago, I had to remind it that two of the solutions it proposed were not possible because the target technology doesn't allow what it wanted to do, despite pointing it to the very docs that says it can't be done in the first place.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
> GPT 5.5 still invents facts rather than looking them up
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
Pointless article (like much of the AI marketing hotness and spin room).
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
The models aside, my impression is that Anthropic is winning in large part because of very pragmatic and high-velocity product development on top of them; like with Claude Code.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
My impression as well. OpenAI was riding the high of ChatGPT with a very confusing and seemingly unfocused offering beyond that. Anthropic was always laser focused on business use cases. Claude Code being the big one. Finance seems to be their next target.
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
ChatGPT dropped the ball for a while that most devs and technical people went to Claude for a year or more, they still probably have the most normie market share + are at least trying to win back some of that delay in their latest model so it'd be interesting to see
I never want to hear from developers again that they are not susceptible to marketing. I see meet ups specifically about Claude often.
Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Couldn’t tell.
Ah that's always SO fun. It doesn't matter how "smart" the person actually are (or think they are) we are ALL susceptible to influence and blind tests are shockingly simple to implement.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.
Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
I'd bet I could tell with a result somewhat better than random chance.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
> There’s stuff beyond just the output that goes into tool choice.
Yup, like billions of capex. Unlike vim.
This is like saying you gave a Taylor Swift fan sheet music from 1984 and from Michael Jackson’s thriller and they couldn’t tell the difference.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
This is my point. The harness itself creates feelings that are positive, but the artifacts produced are similar.
It is like the employee who is slightly worse but is a brownnoser getting promoted more often.
And what do you know, that is what is happening. It is like the coke commercial with the nice music and beautiful person in the back.
Speaking of which, remember Pepsi Challenge? Coke lovers are like the claude code lovers.
The creative output and time to direct, to deliver due to the flow will also be different.
And it really depends on the task. Is it a typical well defined bug, or is it simpel CRUD. Or does it require research, combining different sources of data in a complex and creative ways.
This is also why benches never show reality, and the only real understanding comes if you actually try to build something.
But what they're pointing out is user experience, not marketing.
That's a weird way to look at it. Any car gets you to your destination, but some people prefer driving a sports car or an SUV. They get something out of it that isn't just a marketing delusion, but subjective joy from the interaction with one product over another.
Luxury cars are indeed a good comparison. The subjective joy is a result of the delusion. That is why so much money is spent on such marketing to begin with. The analogous comparison would be if a blindfolded passenger turned out to prefer the Sienna to the 911.
I would actually say it is a luxury car where you have your personal driver and you are free to work on other tasks, and it gets you faster to the destination. Time to me is at least the most valuable thing.
I think for developers the distinction is that ChatGPT is this commercial all in one solution for normies and Claude is specific for developers, in reality as you say the results for normal developers is indistinguishable.
Maybe some people think that but there’s not really any meaningful difference in their offerings
FWIW most of the normies I know are using Claude
Pretty easy to tell depending what the code is. GPT follows this pattern is using maybe_something and using uppercase constants by default. Claude is a little more natural but tends to include more fallbacks than gpt5.5
Modern Tupperware party. 100% agree! That’s the best framing I’ve heard in a long time!
It's crazy hearing devs on this site claim Claude is 10x better than all other AI solutions. I think it is fomo. Claude $LATEST_VERSION is perceived as the best and anything else is "missing out". New version comes out? Suddenly the old version is worthless, how on earth did anyone get work done with that?
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
I’ve been using DeepSeek V4 in OpenCode exclusively for about a month.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
Hey, at least the superior performance of a 4090 or a 5090 can be objectively measured.
Everyone can be propagandised. It's a matter of pushing the right buttons.
Not everyone one. Some are very strong mentally and not so easily malleable.
I don’t think that applies to most on here tho.
Seeing yourself as immune to propaganda probably makes you more susceptible to propaganda.
Edit: Oh they’re trolling, nm. :-/
I RAN to downvote this dunning kruger of a comment.
I don't think it's marketing, for quite a long time Claude was clearly better and not everyone has adapted to the new reality where they have similar capabilities.
I use both, enough to reach Codex highest personal sub limits and Claude is stronger to me specifically because of how the flow of building feels. So the PR for any random task would be irrelevant to me.
Should've used deepseek. That would have have been interesting.
Yes, which means that in the long run this looks ugly.
So much faith and money in this idea, and seeing how fragile it is, does not look good.
Which model produced code that ran faster, with less bugs, etc?
Tribalism at it's worst. It's like the Coke and Pepsi comparisons from years past.
Been to an Anthropic event in Paris last summer.
They served caviar. It probably had good ROI.
I don't think it's marketing, it's the "nobody got fired for buying IBM" effect applied to software developers choosing tools.
It's the same reason why most of the software out there keeps using bloated technologies that are most of the time the wrong fit for the product.
And the same applies to tooling. Nothing new.
Claude was the best for the longest time. GPT5.5 challenges that, but inertia is real
You're comparing apples to oranges. Claude is a frontend overall product name, GPT5.5 is a specific model. Which model within Claude's offerings are you referring to? Opus 4.7, Sonnet 4.6, or something else?
for me personally it's two reasons:
1) Brockman ($25M) and Altman ($1M) both personally donated to Trump/MAGA.
2) Anthropic pushed back against DOD's demand for unrestricted use of AI to kill people while OpenAI eagerly said "please use ours!".
> Couldn’t tell.
I can tell. It's night and day.
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
I recently tried with C# code and Avalonia on Linux. Total disaster. Could only get things to run after 10 attempts or so, and was only trying a very basic example. For some of the experiments I actually gave up.
Claude has an "End Conversation" tool that it can trigger on it's own, forcing your interaction to a close based on it's own feelings towards the conversation.
I have no idea how this wasn't the end of Anthropic's positive public perception.
Luckily this doesn’t come up while writing code. It tends to be if you are chatting it up in friend mode, and ask for a bomb recipe.
OpenAI’s models could be materially better than Anthropic’s and I still wouldn’t use them because I don’t want to support Altman.
Do you think Amodei is different?
codex gtp-5.5 is far superior to opus 4.7 working on large projects
I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8
GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
If you can afford to, I recommend juggling both.
I find arguing that a complex weighted graph has a taste is interesting.
This is not a jab, but a genuine curiosity of mine.
The roulette pockets for the model are bigger for some outputs than others. Draw a big enough black box around it and a different one around humans and it's insistinguishable.
The taste that the complex weighted graph was trained on was better for one than the other I think is the long winded way to say it
Great analysis and follows my experience as well. Codex is better when you know how you want the design and the architecture and you drive the agent a lot more aggressively. Claude Code feels like more autopilot so executives and users who didn’t code before AI like it a lot more.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.
Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.
Not everyone is a developer...
Soon none of us will be! right?
And 4.7 is so last week..
GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.
Funny cause I'm quite literally having this exact issue with 4.8 as we speak. I've been going back and forth with Claude since yesterday afternoon on chopping up, stabilizing and facilitating recovery on a flaky mega-pipeline. Not 5 minutes ago, I had to remind it that two of the solutions it proposed were not possible because the target technology doesn't allow what it wanted to do, despite pointing it to the very docs that says it can't be done in the first place.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
> GPT 5.5 still invents facts rather than looking them up
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
Opus 4.7 is not the current version of Opus.
qazinform.com seems to be shadow-banned (and posted only by OP): https://news.ycombinator.com/from?site=qazinform.com
Pointless article (like much of the AI marketing hotness and spin room).
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
Unicorns, strapped with rockets, too busy looking at each other to realise the Earth is far gone.
They'll kill us all, or they'll kill each other. They sure as hell ain't making the world a better place, like they promised.
Either they are getting fleeced or they are getting very good terms for the investments
Investors of both should read this: https://open.substack.com/pub/sublius/p/srt-introspect-why-c...
How much dilution? Who’s getting the value?
It’s because the programming works.
OpenAI. Spent its resources on AGI whilst Claude worked on making programming work.
Google Gemini is out of the race entirely its programming AI is a joke.
All overvalued.
The models aside, my impression is that Anthropic is winning in large part because of very pragmatic and high-velocity product development on top of them; like with Claude Code.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
Is there something to this?
My impression as well. OpenAI was riding the high of ChatGPT with a very confusing and seemingly unfocused offering beyond that. Anthropic was always laser focused on business use cases. Claude Code being the big one. Finance seems to be their next target.
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
Start what?
ChatGPT dropped the ball for a while that most devs and technical people went to Claude for a year or more, they still probably have the most normie market share + are at least trying to win back some of that delay in their latest model so it'd be interesting to see
The "normie" market doesn't pay for enterprise features though. They might cost more in inference then they make back from advertising.