I do programming as a side project — Marimo has been a huge unlock for me. Part of it has been just watching the videos that are both updates about the software and also little examples of how to think about data science. Marimo also helps curate useful python stuff to try.
Starting to use AI in Marimo, I was able to both ‘learn polars’ for speed, or create a custom AnyWidget so I could make a UI I could imagine that wouldn’t work with standard UI features.
Giving a LLM more context will be fab for me. Now if I could just teach Claude that this really is the ‘graph’ and it can’t ever re-assign a variable. It’s a gotcha of Marimo vs python. Worth it as a hassle for the interactivity. But makes me feel a bit like I’m writing C and the compiler is telling I need a semicolon at the end of the line. I’ve made that error so many times…..
Really glad to hear that! The graph can get complex for big notebooks and maintaining a full picture of variable dependencies across cells is a lot to ask a model to do correctly and hold in context. (It took us a little bit to get the parsing right in marimo!) With pair, it doesn't have to.
The model just "lives" in the environment, and when marimo says "you can't reuse that variable," it renames it and moves on. Hope you give pair a spin!
The idea of an agent having actual working memory inside a live notebook session rather than just firing off ephemeral scripts is genuinely clever — this feels like a much more natural way for humans and models to collaborate.
We’ve been exploring a similar direction too, but with a plain REPL and a much thinner tool surface. In our case, it’s basically one tool for sending input, with interrupts and restarts handled through that same path. Marimo seems to expose much richer notebook structure and notebook-manipulation semantics, which is a pretty different point in the design space.
It seems like the tradeoff is between keeping the interaction model simple and the context small, versus introducing notebook structure earlier so the model works toward an artifact at the same time it iterates and explores. Curious how you think about that balance.
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
One of the authors here, happy to answer questions.
Building pair has been a different kind of engineering for me. Code mode is not a versioned API. Its consumer is a model, not a program. The contract is between a runtime and something that reads docs and reasons about what it finds.
We've changed the surface several times without migrating the skill. The model picks up new instructions and discovers its capabilities within a session, and figures out the rest.
You could wrap pyobject via a proxy that controls context and have AI have a go at it.
You can customise that interface however you want, have a stable interface that does things like
`
proxy.describe()
proxy.list_attrs()
proxy.get_attr("columns")
proxy.sample_rows(limit=5)
proxy.children(path=["model", "layers"], limit=10)
`
This way you get a general interface for AI interacting with your data, while still keeping a very fluid interface.
Built a custom kernel for notebooks with PDB and a similar interface, the trick is to also have access to the same API yourself (preferably with some extra views for humans), so you see the same mediated state the AI sees.
By 'wrap' I mean build a capability-based, effect-aware, versioned-object system on top of objects (execs and namespaces too) instead of giving models direct access. Not sure if your specific runtime constraints make this easier or harder. Does this sound like something you'd be moving towards?
Really interesting idea! Part of the ethos here is that models are already really good at writing Python, and we want to bet on that rather than mediate around it. Python has the nice property of failing loudly (e.g., unknown keywords, type errors, missing attributes) so models can autocorrect quickly. And marimo's reactivity adds another layer of guardrails on top when it comes to managing context/state.
Anecdotally working on pair, I've found it really hard to anticipate what a model might find useful to accomplish a task, and being too prescriptive can break them out of loops where they'd otherwise self-correct. We ran into this with our original MCP approach, which framed access to marimo state as discrete tools (list_cells, read_cell, etc.). But there was a long tail of more tools we kept needing, and behind the scenes they were all just Python functions exposing marimo's state. That was the insight: just let the model write Python directly.
So generally my hesitation with a proxy layer is that it risks boxing the agent in. A mediated interface that helps today might become a constraint tomorrow as models get more capable.
Codex just picks it up. The surface is basically a guarded object model, so pandas/polars-style operations stay close to the APIs the model already knows. There's some extra-tricks but they're probably out of scope for an HN comment.
In practice, Pandas/Polars API would lower to:
proxy -> attr("iloc") -> getitem(slice(1,10,None))
I built something similar with just plain cli agent harnesses for Jupyter a while back.
It supports codex subscriptions and pi, (used to support Claude subs, might still be okay since I didn’t modify the system prompt).
Has some bugs and needs some work but getting help and code changes inline in Jupyter is way better than copy pasta hard to select text from cells and cell output all day.
This is cool. Do you still use this? There has been ideas thrown around to add "prompt" cells to marimo that can similarly create outputs or downstream cells and the prompts are serialized to the notebook py file and part of the DAG.
I do programming as a side project — Marimo has been a huge unlock for me. Part of it has been just watching the videos that are both updates about the software and also little examples of how to think about data science. Marimo also helps curate useful python stuff to try.
Starting to use AI in Marimo, I was able to both ‘learn polars’ for speed, or create a custom AnyWidget so I could make a UI I could imagine that wouldn’t work with standard UI features.
Giving a LLM more context will be fab for me. Now if I could just teach Claude that this really is the ‘graph’ and it can’t ever re-assign a variable. It’s a gotcha of Marimo vs python. Worth it as a hassle for the interactivity. But makes me feel a bit like I’m writing C and the compiler is telling I need a semicolon at the end of the line. I’ve made that error so many times…..
Really glad to hear that! The graph can get complex for big notebooks and maintaining a full picture of variable dependencies across cells is a lot to ask a model to do correctly and hold in context. (It took us a little bit to get the parsing right in marimo!) With pair, it doesn't have to.
The model just "lives" in the environment, and when marimo says "you can't reuse that variable," it renames it and moves on. Hope you give pair a spin!
The idea of an agent having actual working memory inside a live notebook session rather than just firing off ephemeral scripts is genuinely clever — this feels like a much more natural way for humans and models to collaborate.
Very cool!
We’ve been exploring a similar direction too, but with a plain REPL and a much thinner tool surface. In our case, it’s basically one tool for sending input, with interrupts and restarts handled through that same path. Marimo seems to expose much richer notebook structure and notebook-manipulation semantics, which is a pretty different point in the design space.
It seems like the tradeoff is between keeping the interaction model simple and the context small, versus introducing notebook structure earlier so the model works toward an artifact at the same time it iterates and explores. Curious how you think about that balance.
Repo: https://github.com/posit-dev/mcp-repl
Thank you for this!
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
Thanks for the kind words.
We've had the same thought, and are experimenting in this direction in the context of recursive language models.
Let us know if you have feedback!
One of the authors here, happy to answer questions.
Building pair has been a different kind of engineering for me. Code mode is not a versioned API. Its consumer is a model, not a program. The contract is between a runtime and something that reads docs and reasons about what it finds.
We've changed the surface several times without migrating the skill. The model picks up new instructions and discovers its capabilities within a session, and figures out the rest.
You could wrap pyobject via a proxy that controls context and have AI have a go at it. You can customise that interface however you want, have a stable interface that does things like ` proxy.describe() proxy.list_attrs() proxy.get_attr("columns") proxy.sample_rows(limit=5) proxy.children(path=["model", "layers"], limit=10) ` This way you get a general interface for AI interacting with your data, while still keeping a very fluid interface.
Built a custom kernel for notebooks with PDB and a similar interface, the trick is to also have access to the same API yourself (preferably with some extra views for humans), so you see the same mediated state the AI sees.
By 'wrap' I mean build a capability-based, effect-aware, versioned-object system on top of objects (execs and namespaces too) instead of giving models direct access. Not sure if your specific runtime constraints make this easier or harder. Does this sound like something you'd be moving towards?
Really interesting idea! Part of the ethos here is that models are already really good at writing Python, and we want to bet on that rather than mediate around it. Python has the nice property of failing loudly (e.g., unknown keywords, type errors, missing attributes) so models can autocorrect quickly. And marimo's reactivity adds another layer of guardrails on top when it comes to managing context/state.
Anecdotally working on pair, I've found it really hard to anticipate what a model might find useful to accomplish a task, and being too prescriptive can break them out of loops where they'd otherwise self-correct. We ran into this with our original MCP approach, which framed access to marimo state as discrete tools (list_cells, read_cell, etc.). But there was a long tail of more tools we kept needing, and behind the scenes they were all just Python functions exposing marimo's state. That was the insight: just let the model write Python directly.
So generally my hesitation with a proxy layer is that it risks boxing the agent in. A mediated interface that helps today might become a constraint tomorrow as models get more capable.
How do you teach the model to use this new API? Wouldn't they be more effective just using the polars/pandas API which is has been well trained with?
Codex just picks it up. The surface is basically a guarded object model, so pandas/polars-style operations stay close to the APIs the model already knows. There's some extra-tricks but they're probably out of scope for an HN comment.
In practice, Pandas/Polars API would lower to: proxy -> attr("iloc") -> getitem(slice(1,10,None))
Looks cool. I love notebooks.
I built something similar with just plain cli agent harnesses for Jupyter a while back.
It supports codex subscriptions and pi, (used to support Claude subs, might still be okay since I didn’t modify the system prompt).
Has some bugs and needs some work but getting help and code changes inline in Jupyter is way better than copy pasta hard to select text from cells and cell output all day.
https://github.com/madhavajay/cleon
This is cool. Do you still use this? There has been ideas thrown around to add "prompt" cells to marimo that can similarly create outputs or downstream cells and the prompts are serialized to the notebook py file and part of the DAG.