Astro - Hacker News

54 comments

guilamu 8 minutes ago

Just tested it on my homemade Wordpress+GravityForms benchmark and it's one of the worst model of the leaderboard performance wise and the worst value wise: https://github.com/guilamu/llms-wordpress-plugin-benchmark
I know it's only on a single benchmark, but I dont understand how it can be so bad...
wincy 30 minutes ago

Just tried it out for a prod issue was experiencing. Claude never does this sort of thing, I had it write an update statement after doing some troubleshooting, and I said “okay let’s write this in a transaction with a rollback” and GPT-5.5 gave me the old “okay,
BEGIN TRAN;
-- put the query here
commit;
I feel like I haven’t had to prod a model to actually do what I told it to in awhile so that was a shock. I guess that it does use fewer tokens that way, just annoying when I’m paying for the “cutting edge” model to have it be lazy on me like that.
This is in Cursor the model popped up and so I tried it out from the model selector.
[-]
- XCSme 17 minutes ago
  
  I feel like the last 2-3 generations of models (after gpt-5.3-codex) didn't really improve much, just changed stuff around and making different tradeoffs.
  [-]
  - pixel_popping 14 minutes ago
    
    I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4
    
    [-]
    
    ericpauley 11 minutes ago
    
    Has that task accomplished anything yet?
    
    [-]
    
    codemog 2 minutes ago
    
    I think the OP is in for a rude surprise when the task is “finished”.
    
    xp84 7 minutes ago
    
    Too soon to tell, give it a billion tokens before we make up our minds
- syspec 17 minutes ago
  
  Can't tell if above is good or bad.
_pdp_ a minute ago

A very expensive model for API usage. Fine in codex I think.
neosat an hour ago

Enterprise user here and still seeing only 5.4. Yesterday's announcement said that it will take a few hours to roll out to everybody. OpenAI needs better GTM to set the right expectations.
[-]
- neosat 31 minutes ago
  
  Just refreshed and see 5.5 now - yay! Love the speedy resolution ;) Thanks folks, I'll complain faster next time....
ftonon 7 minutes ago

Looks like the default config in the chat is instant 5.3, it only uses the 5.5 on the thinking variant
czk an hour ago
API page lists the knowledge cutoff as Dec 01, 2025 but when prompting the model it says June 2024.
```
   Knowledge cutoff: 2024-06
   Current date: 2026-04-24

   You are an AI assistant accessed via an API.
```
[-]
- BeetleB 33 minutes ago
  
  I don't know why this keeps coming up. This has always been the least reliable way to know the cutoff date (and indeed, it may well have been trained on sites with comments like these!)
  Just ask it about an event that happened shortly before Dec 1, 2025. Sporting event, preferably.
  [-]
  - czk 29 minutes ago
    
    the model obviously knows things after the reported date but its just curious that it reports that date consistently
    could be they do it intentionally to encourage more tool calls/searches or for tuning reasons
- htrp an hour ago
  
  Can you really believe things that the model says? (A lot of prior model api pages say knowledge cutoffs of June 2024, maybe the model picks that up?)
  [-]
  - czk 33 minutes ago
    
    you cant but its pretty reproducible across api and codex and other agents so i just thought it was odd. full text it gives:
    Knowledge cutoff: 2024-06 Current date: 2026-04-24 You are an AI assistant accessed via an API. # Desired oververbosity for the final answer (not analysis): 5 An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation." An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples." The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.
- swyx an hour ago
  
  can u test it on say who won the 2024 US election
  [-]
  - ghurtado 42 minutes ago
    
    I can't really think of a less reliable test for anything at all than making a random guess as to something that had about 50/50 odds to begin with
    Easiest Turing test ever...
    
    [-]
    
    himata4113 40 minutes ago
    
    ask it 10 times.
    
    [-]
    
    pixel_popping 34 minutes ago
    
    MASSIVE ADVERSARIAL x50
  - WarmWash 28 minutes ago
    
    Usually the labs do some kind of post training on major events so the model isn't totally lost.
    A better test is something like "what is the latest version of NumPy?"
    
    [-]
    
    bakugo 24 minutes ago
    
    That sort of test isn't super reliable either, in my experience.
    You're probably better off asking something like "what are the most notable changes in version X of NumPy?" and repeating until you find the version at which it says "I don't know" or hallucinates.
  - czk 33 minutes ago
    
    with thinking off and tools disabled:
    Donald Trump won the 2024 U.S. presidential election.
- bakugo 27 minutes ago
  
  Models don't know what their cutoff dates are unless told via a system prompt.
  The proper way to figure out the real cutoff date is to ask the model about things that did not exist or did not happen before the date in question.
  A few quick tests suggest 5.5's general knowledge cutoff is still around early 2025.
  [-]
  - czk 26 minutes ago
    
    i wonder if they put an older cutoff date into the prompt intentionally so that when asked on more current events it leans towards tool calls / web searches for tuning
  - MallocVoidstar 13 minutes ago
    
    OpenAI does tell the model the current date via API, so it's odd for them not to also tell the model its cutoff
  - soco 23 minutes ago
    
    Stupid question: wouldn't it then search the web for that event?
    
    [-]
    
    bakugo 21 minutes ago
    
    If you have web search enabled, sure. But if you're testing on the API, you can just not enable it.
sigmoid10 an hour ago

Huh. Yesterday they said:
>API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale.
And now this. I guess one day counts as "very soon." But I wonder what that meant for these safeguards and security requirements.
[-]
- FINDarkside an hour ago
  
  When stuff is delayed due to "safeguards" it just means they don't think they have the compute to release it right now.
- simonw an hour ago
  
  I wonder if the fact that GPT-5.5 was already available in their Codex-specific API which they had explicitly told people they were allowed to use for other purposes - https://simonwillison.net/2026/Apr/23/gpt-5-5/#the-openclaw-... - accelerated this release!
- embedding-shape an hour ago
  
  The same person who've mercilessly lied about safety is still running the company, so not sure why anyone would expect any different from them moving forward. Previous example:
  > In 2023, the company was preparing to release its GPT-4 Turbo model. As Sutskever details in the memos, Altman apparently told Murati that the model didn’t need safety approval, citing the company’s general counsel, Jason Kwon. But when she asked Kwon, over Slack, he replied, “ugh . . . confused where sam got that impression.”
  Lots of cases where Altman hass not been entirely forthcoming about how important (or not) safety is for OpenAI. https://www.newyorker.com/magazine/2026/04/13/sam-altman-may... (https://archive.is/a2vqW)
throw03172019 an hour ago

Faster than anticipated because of Deepseek release?
[-]
- XCSme 16 minutes ago
  
  Doubt it, DeepSeek v4 is quite underwhelming.
- swyx an hour ago
  
  more like they wanted to release it yesterday but merely had some last min flags they wanted to hold off for
  [-]
  - Jhonwilson 31 minutes ago
    
    ok not bad
- m3kw9 23 minutes ago
  
  Maybe but no one serious is using deepseek
redsaber an hour ago

not available for Github Copilot pro(only in pro+, business and enterprise), I am really now feeling the era of subsidized AI is over.
[-]
- sunaookami 13 minutes ago
  
  With a 7.5x multiplier and even that is a promo!! Microsoft is insane! https://github.blog/changelog/2026-04-24-gpt-5-5-is-generall...
- skeledrew 26 minutes ago
  
  This is where the emigration to Chinese providers begins.
pants2 an hour ago

Is anyone here actually using pro models through the API? I'd be very curious what the use-case is.
[-]
- chadash an hour ago
  
  Yes. High value work where cost (mostly) doesn't matter. For example, if I need to look over a legal doc for possible mistakes (part of a workflow i have), it doesn't matter (in my case) whether it costs $0.01 or $10.00, since it's a somewhat infrequent event. So i'll pay $9.99 more, even if the model is only slightly better.
  [-]
  - bogtog 30 minutes ago
    
    I'm surprised I never heard people talking about using -Pro variants, even though their rates ($125-175/M?) aren't drastically larger than old Opus ($75/M), which people seemed to use
  - freedomben an hour ago
    
    Indeed, even just Terms of Service and Privacy Policy work. Infrequent enough that cost isn't an issue, but model quality absolutely is
- ComputerGuru an hour ago
  
  Yes? The same reason you would use it via the tooling.
pillefitz 30 minutes ago

Please consider the ethical aspects of giving money to OpenAI versus alternatives.
gigatexal an hour ago

what's the real world comparison to opus 4.7 fellow coders?
Jhonwilson 32 minutes ago

that is great news
rvnx an hour ago

Very bad habit these safeguards. These "safety" filters are counter-productive and even can be dangerous.
In my place for example, a lot of doctors are using ChatGPT both to search diagnosis and communicate with non-English speaking patients.
Even yourself, when you want to learn about one disease, about some real-world threats, some statistics, self-defense techniques, etc.
Otherwise it's like blocking Wikipedia for the reason that using that knowledge you can do harmful stuff or read things that may change your mind.
Freedom to read about things is good.
[-]
- NicuCalcea 39 minutes ago
  
  > a lot of doctors are using ChatGPT both to search diagnosis and communicate with non-English speaking patients
  I think that's the problem. Who's going to claim responsibility when ChatGPT hallucinates or mistranslates a patient's diagnosis and they die? For OpenAI, this would at best be a PR nightmare, so that's why they have safeguards.
  [-]
  - hellohello2 37 minutes ago
    
    The doctor would be responsible.
    I had a choice better a doctor that used AI or not, I would much prefer one that did...
    
    [-]
    
    NicuCalcea 10 minutes ago
    
    The doctor would be responsible for the accuracy of their translation tool, something they can't verify but you expect them to use?
- timedude 44 minutes ago
  
  Yup, deliberately making the model retarded