Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.
Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.
Fair point, but I'd push back on "none of these alternative formats exist in training data."
ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:
table.users
id name email
1 Alice alice@example.com
...is structurally identical to markdown tables and CSVs that dominate training corpora.
On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.
If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.
The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.
The 20% ISON adds:
- Multiple named tables in one doc
- Cross-table references
- No escaping hell (quoted strings handled cleanly)
- Schema validation (ISONantic)
If you're stuffing one flat table into context, CSV works fine. When you have users + orders + products with relationships, ISON saves you from JSON's bracket tax.
Essentially yes, but with a few additions CSV lacks:
1. Multiple tables in one document (table.users, table.orders)
2. References between tables (:user:42 links to id 42)
3. Object blocks for config/metadata
4. Streaming format (ISONL) for large datasets
The type annotations are optional - they help LLMs understand the schema without inference.
You could think of it as "CSV that knows about relationships" - which is exactly what multi-agent systems need when passing state around.
Tested across Claude, GPT-4, DeepSeek, and Llama 3.
The key finding: LLMs handle tabular formats natively because they've seen billions of markdown tables and CSVs in training.
No special prompting needed.
For associations, I tested with multi-table ISON docs like:
table.users
id name
1 Alice
2 Bob
table.orders
id user_id product
101 :1 Widget
102 :2 Gadget
Prompt: "What did Alice order?"
All models correctly resolved :1 → Alice → Widget without explicit instructions about the reference syntax.
The 30-70% token savings come from removing JSON's structural overhead (braces, quotes, colons, commas) while keeping the same semantic density.
Haven't published formal benchmarks on this yet - that's good feedback. I should.
Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.
Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.
Fair point, but I'd push back on "none of these alternative formats exist in training data."
ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:
table.users id name email 1 Alice alice@example.com
...is structurally identical to markdown tables and CSVs that dominate training corpora.
On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.
If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.
The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.
Just use CSV at this point :D
Ha, fair. CSV gets you 80% there.
The 20% ISON adds: - Multiple named tables in one doc - Cross-table references - No escaping hell (quoted strings handled cleanly) - Schema validation (ISONantic)
If you're stuffing one flat table into context, CSV works fine. When you have users + orders + products with relationships, ISON saves you from JSON's bracket tax.
So CSV with a “typed” header?
Essentially yes, but with a few additions CSV lacks:
1. Multiple tables in one document (table.users, table.orders) 2. References between tables (:user:42 links to id 42) 3. Object blocks for config/metadata 4. Streaming format (ISONL) for large datasets
The type annotations are optional - they help LLMs understand the schema without inference.
You could think of it as "CSV that knows about relationships" - which is exactly what multi-agent systems need when passing state around.
Got it. Thanks.
Any data on how LLMs like this format? Are they able to make the associations etc?
Yes - I ran a 300 Questions benchmark comparing ISON vs JSON vs JSON-COMPACT etc on the same tasks.
ISON: 88.3% accuracy JSON: lower (can share exact numbers if interested)
Tested across Claude, GPT-4, DeepSeek, and Llama 3.
The key finding: LLMs handle tabular formats natively because they've seen billions of markdown tables and CSVs in training. No special prompting needed.
For associations, I tested with multi-table ISON docs like:
table.users id name 1 Alice 2 Bob
table.orders id user_id product 101 :1 Widget 102 :2 Gadget
Prompt: "What did Alice order?"
All models correctly resolved :1 → Alice → Widget without explicit instructions about the reference syntax.
The 30-70% token savings come from removing JSON's structural overhead (braces, quotes, colons, commas) while keeping the same semantic density.
Haven't published formal benchmarks on this yet - that's good feedback. I should.
tried this with msgpack last year. accuracy tanked. models have seen a trillion json examples, like 12 of whatever format you invent