The lack of naming seems to indicate a fundamental misunderstanding of how LLM coding agents are successful, and just makes me doubt anything about this project being useful and workable.
> The empirical literature shows that models are particularly vulnerable to naming-related errors like choosing misleading names, reusing names incorrectly, and losing track of which name refers to which value.
I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.
Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.
For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.
I’m surprised by this. Most likely significant white space is a big part of the problem (LLMs seem horrible at white space). Functional with types has been a win for me with Gleam.
> There are no variable names. @Int.0 is the most recent Int binding; @Int.1 is the one before.
You already lost me here. There's a reason variable names are a thing in programming, and that's to semantically convey meaning. This matters no matter whether a human is writing the code or a LLM.
I think Hindley Milner (for decidability) + Linear Types (for resource management) + Refinement Types (for lightly asserting invariants) + Delimited Continuation based Effects (for tracking effects) + Unison style Content Addressability (for corralling code changes, documentation, and tests) would make a really nice language for an LLM.
This isn't my project, but I shared it here because it has a few important ideas I've been thinking about in my own work. Effect type systems in particular are a really good fit for LLMs because they allow you to reason very precisely about a program's capabilities before runtime (basically, using the type system for capability proofs). This helps you trust agent-created code (for example, you know it can't do IO), or, if the code does require certain capabilities, run it in a sandbox (e.g., mock network or filesystem). This kind of language design also provides a safer foundation for complex meta-systems of agents-that-create-agents, depending on how the runtime is implemented, though Vera may be somewhat limited in that particular respect.
The major design decision I'm a little skeptical about is removing variable names; it would be interesting to see empirical data on that as it seems a bit unintuitive. I would expect almost the opposite, that variable names give LLMs some useful local semantics.
This is exactly the wrong approach. LLMs are good at writing programming languages they already know, that are well represented in the training data, not at writing programming languages that they have never seen before, so that you have to include the entire programming language manual and lots of example code in every prompt.
Is there any evidence that using structural references rather than names allows large language models to generate better code? This bit just feels like obfuscation for obfustcation’s sake.
I've read the FAQ (https://github.com/aallan/vera/blob/main/FAQ.md) that provides the justification for this and it is, IMO, fairly weak. The main argument is that misleading names can confuse models. I have no problem believing this bit I'm not sure why we should assume code will have misleading names. In fact, the same document says that in tests they've had LLMs mix up the indices, which is exactly the problem I would foresee. It seems especially messy that the name for the same variable will change in different places in the code. The utility of De Bruijn indices is easy substitutability of expressions, which seems like totally the wrong thing to optimize for in a programming language.
Edit: the more I think about it the more this seems like a really bad idea. Three more issues come to mind: 1) it becomes impossible to grep for a variable, which I know agents do all the time. 2) editing code at the top of the function, say introducing a new variable, can require editing all the code in the rest of the function, even if it was semantically unchanged! 3) they say it is less context for the LLM to track but now, instead of just having to know the name of one variable, you have to keep track of every other variable in the function
The lack of naming seems to indicate a fundamental misunderstanding of how LLM coding agents are successful, and just makes me doubt anything about this project being useful and workable.
> The empirical literature shows that models are particularly vulnerable to naming-related errors like choosing misleading names, reusing names incorrectly, and losing track of which name refers to which value.
I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.
Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.
For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.
I'm curious what issues you had with haskell? I have had the opposite experience and find them dreadful at Java et al.
Surely, denser languages should be better for LLMs?
> But writing Haskell, it's pretty bad,
I’m surprised by this. Most likely significant white space is a big part of the problem (LLMs seem horrible at white space). Functional with types has been a win for me with Gleam.
> There are no variable names. @Int.0 is the most recent Int binding; @Int.1 is the one before.
You already lost me here. There's a reason variable names are a thing in programming, and that's to semantically convey meaning. This matters no matter whether a human is writing the code or a LLM.
I think Hindley Milner (for decidability) + Linear Types (for resource management) + Refinement Types (for lightly asserting invariants) + Delimited Continuation based Effects (for tracking effects) + Unison style Content Addressability (for corralling code changes, documentation, and tests) would make a really nice language for an LLM.
This isn't my project, but I shared it here because it has a few important ideas I've been thinking about in my own work. Effect type systems in particular are a really good fit for LLMs because they allow you to reason very precisely about a program's capabilities before runtime (basically, using the type system for capability proofs). This helps you trust agent-created code (for example, you know it can't do IO), or, if the code does require certain capabilities, run it in a sandbox (e.g., mock network or filesystem). This kind of language design also provides a safer foundation for complex meta-systems of agents-that-create-agents, depending on how the runtime is implemented, though Vera may be somewhat limited in that particular respect.
The major design decision I'm a little skeptical about is removing variable names; it would be interesting to see empirical data on that as it seems a bit unintuitive. I would expect almost the opposite, that variable names give LLMs some useful local semantics.
> Division by zero is not a runtime error — it is a type error. The compiler checks every call site to prove the divisor is non-zero.
Elaborate a little here.
This is exactly the wrong approach. LLMs are good at writing programming languages they already know, that are well represented in the training data, not at writing programming languages that they have never seen before, so that you have to include the entire programming language manual and lots of example code in every prompt.
Is there any evidence that using structural references rather than names allows large language models to generate better code? This bit just feels like obfuscation for obfustcation’s sake.
I've read the FAQ (https://github.com/aallan/vera/blob/main/FAQ.md) that provides the justification for this and it is, IMO, fairly weak. The main argument is that misleading names can confuse models. I have no problem believing this bit I'm not sure why we should assume code will have misleading names. In fact, the same document says that in tests they've had LLMs mix up the indices, which is exactly the problem I would foresee. It seems especially messy that the name for the same variable will change in different places in the code. The utility of De Bruijn indices is easy substitutability of expressions, which seems like totally the wrong thing to optimize for in a programming language.
Edit: the more I think about it the more this seems like a really bad idea. Three more issues come to mind: 1) it becomes impossible to grep for a variable, which I know agents do all the time. 2) editing code at the top of the function, say introducing a new variable, can require editing all the code in the rest of the function, even if it was semantically unchanged! 3) they say it is less context for the LLM to track but now, instead of just having to know the name of one variable, you have to keep track of every other variable in the function