I would assume that this is easy enough to implement that it will likely appear in a minor update to the upcoming Visual Studio version. MS kept updating the compiler since VS 2022, too.
The thing that always irks me about c++ is this sort of thing:
> Explanation
1) Searches for the resource identified by h-char-sequence in implementation-defined manner.
Okay, so now I have to make assumptions that the implementation is reasonable, and won't go and "search" by asking an LLM or accidentally revealing my credit card details to a third party, right?
And even if the implementation _is_ reasonable the only way I know what "search" means in this context is by looking at an example, and the example says "it's basically a filename".
So now I think to myself: if I want to remain portable, I'll just write a python script to do a damn substitution to embed my file, which is guaranteed to work under _any_ implementation and I don't have to worry about it as soon as I have my source file.
You're not the only one who feels that way, but IMHO it's not a valid complaint.
The C++ standard says implementation defined because the weeds get very thick very quickly:
- Are paths formed with forward slash or backslash?
- Case sensitive?
- NT style drive letter or Posix style mounts?
- For relative paths, what is it relative to? When there are multiple matches, what is the algorithm to determine priority?
- What about symlinks and hard links?
- Are http and ftp URIs supported (e.g. an online IDE like godbolt). If so, which versions of those protocols? TLS 1.3+ only? Are you going to accept SHA-1?
- Should the file read be transactional?
People already complain that the C++ standard is overly complicated. So instead of adding even more complexity by redefining the OS semantics of your build platform in a language spec, they use "implementation defined" as a shorthand for "your compiler will call fopen" plus some implementation wiggle room like command line options for specifying search paths and the strategy for long paths on Windows
What if #embed steals my credit card data is a pointless strawman. If a malicious compiler dev wanted to steal your credit card data, they'd just inject the malicious code; not act like a genie, searching the C++ spec with a fine comb for a place where they could execute malicious code while still *technically* being standards conformant. You know that, I know what, we all know that. So why are we wasting words discussing it?
The real reason why this stuff in underspecified in the spec is that some mainframe operating systems don't have file systems in the common modern sense, but support C++. Those vendors push back a lot against narroed definitions as far as I know.
Including files also opens up some potential security issues that the standards committee just didn't want to prescribe solutions to. Compiler explorer hides easter eggs around the virtual filesystem, for example:
> So now I think to myself: if I want to remain portable, I'll just write a python script
How can you know that your Python implementation won't send your credit card details to an LLM when it runs your script? It does not follow an ISO standard that says it can't do that. You're not making assumptions about it's behavior, are you?
This doesn't sound like the kind of portability anyone is really worried about. I get that the docs on the linked site are written in standards-ese and are complicated by macro replacement, but I don't think the outcome of sending your credit card details away is gonna be an outcome. If it was, an uncharitable implementation with access to your card details would be free to do that any time you gave it input invoking undefined behaviour (which is of course not uncommon, especially in incorrect code).
If you want to remain portable, write your code in the intersection of the big 3 - GCC, Clang and MSVC - and you’ll be good enough. Other implementations will either be weird enough that many things you’d expect to work won’t or are forced to copy what those 3 do anyway.
...what? What are you talking about? In what world would a compiler implement a preprocessor directive to ever use an llm, the internet, or your credit card details (from where would it get those)??? There are always implementation defined things in every language, for example, ub behavior. Do you get worried that someone will steal your bitcoin every time you use after free? Of course not! Even in Python when you OOM -- at least in CPython -- you crash with undefined behavior.
let me know when my embedded target's compiler is C23 compliant (i mean, i whish. we may be getting C11 or even C17 some times next year but i'm not holding my breath)
You can also do it using ld - it's something like ld -r --format binary -o out.o <file>, although you do want some build system assistance to generate header files allowing you to access the thing (somewhat similar to the assembly example here). It's a bit of a performance but I strongly prefer it to generating header files in the earlier options - those header files can end up being _very_ large (they generally multiply up the size of the embedded file by 2-4x) and slow to compile.
All a bit less relevant now since recent C++ versions have this built in by default. Generally something languages have been IMO too slow on (e.g. Go picked this up four or so years ago, after a bunch of less nice home-grown alternatives), it's actually just really useful to make things work in the real world, especially for languages that you can distribute as single-file binaries (which IMO should be all of them, but sadly it's not always).
You can apply `#` to __VA_ARGS__, which won’t preserve the exact whitespace, but for many languages it’s good enough. biggest issue is you can’t have `#` in the text.
Don't know what you mean, it works fine here. Python is too large and unreliable a dependency for something so trivial (which can be accomplished using standard POSIX utilities if need be).
Indeed, even writing this utility in C is trivial and has 0 extra dependency for a pure C/C++ project. Avoiding #embed also removes the dependency to a C++23 capable compiler, which might not be available in uncommon scenarios.
Python is pretty much mandatory for Linux systems nowadays, unless you're dealing with something really minimalist or trying to be very portable it's safe to rely on.
Outdated, modern solution is baked in now
https://en.cppreference.com/w/c/preprocessor/embed
That's good to know, but I've noticed it was added in C++26 and seems to be supported in GCC 15 and Clang 19, but not MSVC.
I think in a few (3-4?) years it will be safe to use, but in any case not now.
Still, good to know that it exists.
I would assume that this is easy enough to implement that it will likely appear in a minor update to the upcoming Visual Studio version. MS kept updating the compiler since VS 2022, too.
It will be at least a decade before I can rely on that in systems software that needs to be portable.
The thing that always irks me about c++ is this sort of thing:
> Explanation 1) Searches for the resource identified by h-char-sequence in implementation-defined manner.
Okay, so now I have to make assumptions that the implementation is reasonable, and won't go and "search" by asking an LLM or accidentally revealing my credit card details to a third party, right?
And even if the implementation _is_ reasonable the only way I know what "search" means in this context is by looking at an example, and the example says "it's basically a filename".
So now I think to myself: if I want to remain portable, I'll just write a python script to do a damn substitution to embed my file, which is guaranteed to work under _any_ implementation and I don't have to worry about it as soon as I have my source file.
Does anyone else feel this way or is it just me?
You're not the only one who feels that way, but IMHO it's not a valid complaint.
The C++ standard says implementation defined because the weeds get very thick very quickly:
- Are paths formed with forward slash or backslash?
- Case sensitive?
- NT style drive letter or Posix style mounts?
- For relative paths, what is it relative to? When there are multiple matches, what is the algorithm to determine priority?
- What about symlinks and hard links?
- Are http and ftp URIs supported (e.g. an online IDE like godbolt). If so, which versions of those protocols? TLS 1.3+ only? Are you going to accept SHA-1?
- Should the file read be transactional?
People already complain that the C++ standard is overly complicated. So instead of adding even more complexity by redefining the OS semantics of your build platform in a language spec, they use "implementation defined" as a shorthand for "your compiler will call fopen" plus some implementation wiggle room like command line options for specifying search paths and the strategy for long paths on Windows
What if #embed steals my credit card data is a pointless strawman. If a malicious compiler dev wanted to steal your credit card data, they'd just inject the malicious code; not act like a genie, searching the C++ spec with a fine comb for a place where they could execute malicious code while still *technically* being standards conformant. You know that, I know what, we all know that. So why are we wasting words discussing it?
The real reason why this stuff in underspecified in the spec is that some mainframe operating systems don't have file systems in the common modern sense, but support C++. Those vendors push back a lot against narroed definitions as far as I know.
Including files also opens up some potential security issues that the standards committee just didn't want to prescribe solutions to. Compiler explorer hides easter eggs around the virtual filesystem, for example:
https://godbolt.org/z/KcqTM5bTr
> So now I think to myself: if I want to remain portable, I'll just write a python script
How can you know that your Python implementation won't send your credit card details to an LLM when it runs your script? It does not follow an ISO standard that says it can't do that. You're not making assumptions about it's behavior, are you?
This doesn't sound like the kind of portability anyone is really worried about. I get that the docs on the linked site are written in standards-ese and are complicated by macro replacement, but I don't think the outcome of sending your credit card details away is gonna be an outcome. If it was, an uncharitable implementation with access to your card details would be free to do that any time you gave it input invoking undefined behaviour (which is of course not uncommon, especially in incorrect code).
If you want to remain portable, write your code in the intersection of the big 3 - GCC, Clang and MSVC - and you’ll be good enough. Other implementations will either be weird enough that many things you’d expect to work won’t or are forced to copy what those 3 do anyway.
...what? What are you talking about? In what world would a compiler implement a preprocessor directive to ever use an llm, the internet, or your credit card details (from where would it get those)??? There are always implementation defined things in every language, for example, ub behavior. Do you get worried that someone will steal your bitcoin every time you use after free? Of course not! Even in Python when you OOM -- at least in CPython -- you crash with undefined behavior.
Sorry for being so aggressive. I suppose I'm just very confused at where you're coming from.
let me know when my embedded target's compiler is C23 compliant (i mean, i whish. we may be getting C11 or even C17 some times next year but i'm not holding my breath)
You can also do it using ld - it's something like ld -r --format binary -o out.o <file>, although you do want some build system assistance to generate header files allowing you to access the thing (somewhat similar to the assembly example here). It's a bit of a performance but I strongly prefer it to generating header files in the earlier options - those header files can end up being _very_ large (they generally multiply up the size of the embedded file by 2-4x) and slow to compile.
All a bit less relevant now since recent C++ versions have this built in by default. Generally something languages have been IMO too slow on (e.g. Go picked this up four or so years ago, after a bunch of less nice home-grown alternatives), it's actually just really useful to make things work in the real world, especially for languages that you can distribute as single-file binaries (which IMO should be all of them, but sadly it's not always).
The special ld argument is a gnu thing, it's not portable and at least lld doesn't support it.
https://github.com/jcalvinowens/ircam-viewer/commit/17b3533b...
My current workaround until it arrives in all C++ compilers
``` inline constexpr auto bootstrap = #include "bootstrap.lua" ;
// ... later
lua.script(bootstrap, "@bootstrap"); ```
The lua code ``` R"( -- your code here )"; ```
surely the preprocessor method doesn't work in the general case, since the data can contain commas or parentheses.
Regardless all of the methods suggested are terrible. If you don't have access to #embed, just write a trivial python script.
You can apply `#` to __VA_ARGS__, which won’t preserve the exact whitespace, but for many languages it’s good enough. biggest issue is you can’t have `#` in the text.
How is `xxd -i' terrible?
It's still lacking content that goes before/after the output.
Just write a Python script that does the whole thing.
Don't know what you mean, it works fine here. Python is too large and unreliable a dependency for something so trivial (which can be accomplished using standard POSIX utilities if need be).
Indeed, even writing this utility in C is trivial and has 0 extra dependency for a pure C/C++ project. Avoiding #embed also removes the dependency to a C++23 capable compiler, which might not be available in uncommon scenarios.
Python is pretty much mandatory for Linux systems nowadays, unless you're dealing with something really minimalist or trying to be very portable it's safe to rely on.
> it's safe to rely on
Is there any guarantee they won't break backwards compatibility again?
Why don't you #embed?
Because the linked article is from 2013.