I've always wondered at the motivatons of the various string routines in C - every one of them seems to have some huge caveat which makes them useless.
After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.
strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.
Yes, these were also common in several wire formats I had to use for market data/entry.
You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (intra-day or shortly before the open restarts were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison.
As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs.
Back when strncpy was written there was no undefined behaviour (as the compiler interprets it today). The result would depend on the implementation and might differ between invocations, but it was never the "this will not happen" footgun of today. The modern interpretation of undefined behaviour in C is a big blemish on the otherwise excellent standards committee, committed (hah) in the name of extremely dubious performance claims. If "undefined" meaning "left to the implementation" was good enough when CPU frequency was measured in MHz and nobody had more than one, surely it is good enough today too.
Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level.
You don’t do that by accident. Fixed-width strings are thoroughly outdated and unusual. Your mental model of them is very different from regular C strings.
Ignore the prefix and always treat strncpy() as a special binary data operation for an era where shaving bytes on storage was important. It's for copying into a struct with array fields or direct to an encoded block of memory. In that context you will never be dependent on the presence of NUL. The only safe usage with strings is to check for NUL on every use or wrap it. At that point you may as well switch to a new function with better semantics.
> To make sure that the size checks cannot be separated from the copy itself we introduced a string copy replacement function the other day that takes the target buffer, target size, source buffer and source string length as arguments and only if the copy can be made and the null terminator also fits there, the operation is done.
... And if the copy can't be made, apparently the destination is truncated as long as there's space (i.e., a null terminator is written at element 0). And it returns void.
I'm really not sold on that being the best way to handle the case where copying is impossible. I'd think that's an error case that should be signaled with a non-zero return, leaving the destination buffer alone. Sure, that's not supposed to happen (hence the DEBUGASSERT macro), but still. It might even be easier to design around that possibility rather than making it the caller's responsibility to check first.
I'm surprised curlx_strcopy doesn't return success. Sure you could check if dest[0] != '/0' if you care to, but that's not only clumsy to write but also error prone, and so checking for success is not encouraged.
This is especially bizarre given that he explains above that "it is rare that copying a partial string is the right choice" and that the previous solution returned an error...
So now it silently fails and sets dest to an empty string without even partially copying anything!?
assert() is always only compiled if NDEBUG is not defined. I hope DEBUGASSERT is just that too because it really sounds like it, even more so than assert does.
But regardless of whether the assert is compiled or not, its presence strongly signals that "in a C program strcpy should only be used when we have full control of both" is true for this new function as well.
> It has been proven numerous times already that strcpy in source code is like a honey pot for generating hallucinated vulnerability claims
This closing thought in the article really stood out to me. Why even bother to run AI checking on C code if the AI flags strcpy() as a problem without caveat?
It's not quite as black and white as the article implies. The hallucinated vulnerability reports don't flag it "without caveat", they invent a convoluted proof of vulnerability with a logical error somewhere along the way, and then this is what gets submitted as the vulnerability report. That's why it's so agitating for the maintainers: it requires reading a "proof" and finding the contradiction.
Because these people who run AI checks on OSS code and submit bogus bug reports either assume that AIs don't make mistakes, or just don't care if the report is legit or not, because there's little to no personal cost to them even if it isn't.
Its weird though because looking through the hackone reports in the slop wiki page there aren't actually reproduction steps. It's basically always just a line of code and an explanation of how a function can be mis-used but not a "make a webserver that has this hardcoded response".
So like why doesn't the person iterate with the AI until they understand the bug (and then ultimately discover it doesn't exist)? Like have any of this bug reports actually paid out? It seems like quickly people should just give up from a lack of rewards.
As long as the number of people newly being convinced that AI generated bounty demands are a good way to make money equals or exceeds the number of people realising it isn't and giving up, the problem remains.
Not helped, I imagine, that once you realise it doesn't work, an easy pivot is to start convincing new people that it'll work if they pay you money for a course on it.
Congrats on the completion of this effort! C/C++ can be memory safe but take some effort.
IMHO the timeline figure could benefit in mobile from using larger fonts. Most plotting libraries have horrible font size defaults. I wonder why no library picked the other extreme end: I have never seen too large an axis label yet.
Apart from Daniel Sternberg's frequent complaints about AI slop, he also writes [1]
> A new breed of AI-powered high quality code analyzers, primarily ZeroPath and Aisle Research, started pouring in bug reports to us with potential defects. We have fixed several hundred bugs as a direct result of those reports – so far.
It's a symptom of complete failure of this industry that maintainers are even remotely thinking about, much less implementing changes in their work to stave off harassment over false security impact from bots.
Nonce and websockets don't appear at all in the blog post. The only thing the ai slop got right is that by removing strcpy curl will get less issues [submitted about it].
I don't see a problem with that, but for the record, the title on the site is lower-case for me (both browser tab title, and the header when in reader mode).
I've always wondered at the motivatons of the various string routines in C - every one of them seems to have some huge caveat which makes them useless.
After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.
Something like this: https://github.com/msteinert/bstring
It's from a time before computer viruses no?
But also all of this book-keeping takes up extra time and space which is a trade-off easily made nowadays.
Yes, in the old times if you crashed a program or whole computer with invalid input, it was your fault.
Viruses did exist, and these were considered users' fault too.
strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.
A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en
Yes, these were also common in several wire formats I had to use for market data/entry.
You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (intra-day or shortly before the open restarts were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison.
strncpy doesn’t handle overlapping buffers (undefined behavior). Better to use strncpy_s (if you can) as it is safer overall. See: https://en.cppreference.com/w/c/string/byte/strncpy.html.
As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs.
Back when strncpy was written there was no undefined behaviour (as the compiler interprets it today). The result would depend on the implementation and might differ between invocations, but it was never the "this will not happen" footgun of today. The modern interpretation of undefined behaviour in C is a big blemish on the otherwise excellent standards committee, committed (hah) in the name of extremely dubious performance claims. If "undefined" meaning "left to the implementation" was good enough when CPU frequency was measured in MHz and nobody had more than one, surely it is good enough today too.
Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level.
Isn't strlcpy the safer solution these days?
A big footgun with strncpy is that the output string may not be null terminated.
Yeah but fixed width strings don’t need null termination. You know exactly how long the string is. No need to find that null byte.
Until you pass them as a `char *` by accident and it eventually makes its way to some code that does expect null termination.
There’s languages where you can be quite confident your string will never need null termination… but C is not one of them.
You don’t do that by accident. Fixed-width strings are thoroughly outdated and unusual. Your mental model of them is very different from regular C strings.
Good luck though remembering not to pass one to any function that does expect to find a null terminator.
Ignore the prefix and always treat strncpy() as a special binary data operation for an era where shaving bytes on storage was important. It's for copying into a struct with array fields or direct to an encoded block of memory. In that context you will never be dependent on the presence of NUL. The only safe usage with strings is to check for NUL on every use or wrap it. At that point you may as well switch to a new function with better semantics.
Seriously. We have type systems and compilers that help us to not forget these things. It's not the 70s anymore!
> To make sure that the size checks cannot be separated from the copy itself we introduced a string copy replacement function the other day that takes the target buffer, target size, source buffer and source string length as arguments and only if the copy can be made and the null terminator also fits there, the operation is done.
... And if the copy can't be made, apparently the destination is truncated as long as there's space (i.e., a null terminator is written at element 0). And it returns void.
I'm really not sold on that being the best way to handle the case where copying is impossible. I'd think that's an error case that should be signaled with a non-zero return, leaving the destination buffer alone. Sure, that's not supposed to happen (hence the DEBUGASSERT macro), but still. It might even be easier to design around that possibility rather than making it the caller's responsibility to check first.
I'm surprised curlx_strcopy doesn't return success. Sure you could check if dest[0] != '/0' if you care to, but that's not only clumsy to write but also error prone, and so checking for success is not encouraged.
This is especially bizarre given that he explains above that "it is rare that copying a partial string is the right choice" and that the previous solution returned an error...
So now it silently fails and sets dest to an empty string without even partially copying anything!?
I guess the idea is that if the code does not crash at this line:
it means it succeeded. Although some compilers will remove the assertions in release builds.I would have preferred an explicit error code though.
assert() is always only compiled if NDEBUG is not defined. I hope DEBUGASSERT is just that too because it really sounds like it, even more so than assert does.
But regardless of whether the assert is compiled or not, its presence strongly signals that "in a C program strcpy should only be used when we have full control of both" is true for this new function as well.
From the article:
> It has been proven numerous times already that strcpy in source code is like a honey pot for generating hallucinated vulnerability claims
This closing thought in the article really stood out to me. Why even bother to run AI checking on C code if the AI flags strcpy() as a problem without caveat?
It's not quite as black and white as the article implies. The hallucinated vulnerability reports don't flag it "without caveat", they invent a convoluted proof of vulnerability with a logical error somewhere along the way, and then this is what gets submitted as the vulnerability report. That's why it's so agitating for the maintainers: it requires reading a "proof" and finding the contradiction.
Because these people who run AI checks on OSS code and submit bogus bug reports either assume that AIs don't make mistakes, or just don't care if the report is legit or not, because there's little to no personal cost to them even if it isn't.
Because people are stupid and use AI for things it is not good at.
> people are stupid
people overestimate AI
Its weird though because looking through the hackone reports in the slop wiki page there aren't actually reproduction steps. It's basically always just a line of code and an explanation of how a function can be mis-used but not a "make a webserver that has this hardcoded response".
So like why doesn't the person iterate with the AI until they understand the bug (and then ultimately discover it doesn't exist)? Like have any of this bug reports actually paid out? It seems like quickly people should just give up from a lack of rewards.
As long as the number of people newly being convinced that AI generated bounty demands are a good way to make money equals or exceeds the number of people realising it isn't and giving up, the problem remains.
Not helped, I imagine, that once you realise it doesn't work, an easy pivot is to start convincing new people that it'll work if they pay you money for a course on it.
Congrats on the completion of this effort! C/C++ can be memory safe but take some effort.
IMHO the timeline figure could benefit in mobile from using larger fonts. Most plotting libraries have horrible font size defaults. I wonder why no library picked the other extreme end: I have never seen too large an axis label yet.
Yes, the graph font-sizes seem intended for printing them on a single sheet of paper, vs squeezed into a single column in a blog.
Removing strcpy from your code does not make it memory safe.
Removing strcpy from your code does make it a little memory safer.
A weird Annex-K like API. The destination buffer size includes space for the trailing nul, but the source size only includes non-nul string bytes.
I don't really think this adds anything over forcing callers to use memcpy directly, instead of strcpy.
Apart from Daniel Sternberg's frequent complaints about AI slop, he also writes [1]
> A new breed of AI-powered high quality code analyzers, primarily ZeroPath and Aisle Research, started pouring in bug reports to us with potential defects. We have fixed several hundred bugs as a direct result of those reports – so far.
[1] https://daniel.haxx.se/blog/2025/12/23/a-curl-2025-review/
That's very interesting! It links to:
https://daniel.haxx.se/blog/2025/10/10/a-new-breed-of-analyz...
and its HN discussion:
https://news.ycombinator.com/item?id=45449348
So? Those are automated analysis tools and by "slop" he seems to refer to careless reports crafted using AI, solely for collecting bounties:
https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...
The AI chatbot vulnerability reports part sure is sad to read.
Why is this even a thing and isn't opt-in?
I dread the idea of starting to get notifications from them in my own projects.
Making a strcpy honeypot doesn’t sound like a bad idea…
Some clever obfuscation would make this even more effective.It's a symptom of complete failure of this industry that maintainers are even remotely thinking about, much less implementing changes in their work to stave off harassment over false security impact from bots.
Because humans generate and relay the slop-reports in the hopes of being helpful
s/being helpful/making money.
LMAO
After all this time the initial AI Slop report was right:
https://hackerone.com/reports/2298307
?
Nonce and websockets don't appear at all in the blog post. The only thing the ai slop got right is that by removing strcpy curl will get less issues [submitted about it].
Title is :
No strcpy either
@dang
I don't see a problem with that, but for the record, the title on the site is lower-case for me (both browser tab title, and the header when in reader mode).
I think the submission originally had a typo ("strpy", with no C)
Ah.