I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.
I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.
I've been playing with 3.5:122b on a GH200 the past few days for rust/react/ts, and while it's clearly sub-Sonnet, with tight descriptions it can get small-medium tasks done OK - as well as Sonnet if the scope is small.
The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.
I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.
It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.
That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.
> it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself.
I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.
The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.
I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.
Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.
Are you running it locally with llama.cpp? If so, is it working without any tweaking of the chat template? The tool calls fail for me when using the default chat template, however it seems to work a whole lot better with this: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/discussions/9#69...
I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.
> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.
In my experience Qwen3.5 is better even at smaller distillations. From what I understand the Qwen3-next series of models was just a test/preview of the architectural changes underpinning Qwen3.5. So Qwen3.5 is a more complete and well trained version of those models.
In my experience qwen 3 coder next is better. I ran quite a few tests yesterday and it was much better at utilizing tool calls properly and understanding complex code. For its size though 3.5 35B was very impressive. coder next is an 80b model so i think its just a size thing - also for whatever reason coder next is faster on my machine. Only model that is competitive in speed is GLM 4.7 flash
I use the term "harness" for those - or just "coding agent". I think orchestrator is more appropriate for systems that try to coordinate multiple agents running at the same time.
This terminology is still very much undefined though, so my version may not be the winning definition.
It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.
Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.
But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.
This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.
ICE has been detaining Chinese people in my area (and going door to door in at least one neighborhood where a lot of Chinese and Indians live). I was hearing about this just last week as word spread amongst the Chinese community here (Ohio) to make sure you have some legal documentation beyond just your driver's license on you at all times for protection. People will hear about this through the grapevine and it has a massive (and rightly so) chilling effect. US labs can try but with US government behaving like it is I don't think they will have much luck.
This is FUD. The US has dumped truckloads of cash to make it likely that masked men with no cameras and little training will parade around abducting anyone they even suspect of being an illegal immigrant, after even Yale admitted it's likely that more than 22M+ people came here illegally. https://insights.som.yale.edu/insights/yale-study-finds-twic...
It'd be good if Congress could do something to remove the masks, put cameras on these agents, and for the local governments to stop fighting removal of all people who are here illegally so we can pretend we have borders again.
I feel like we would disagree on the role of immigration in the US but I really appreciate you calling out how the current administration’s approach is only effective at making viral clips online. Meta comment, but it’s refreshing to talk with people who have different goals while still referencing a shared reality. Removing the masks and adding cameras shouldn’t be controversial unless your goal really is to make a paramilitary force for the president.
The unstated but obvious (to me?) goal of what ICE is doing is not to get large numbers of people out of the country, but to drive costs down for migrant labor by further disenfranchising them, making them scared, marginal, etc.
If they actually thoroughly evicted non-status migrant workers they'd have a outright revolt on their hands from farmers and other businesses that depend on them.
Instead those businesses can now take further advantage of the fear of harassment and/or deportation to drive down compensation and rights.
Contrast with countries like Canada that have a legal temporary foreign agriculture worker program that provides a regulated source of seasonal migrant farm worker labour under a non-citizen temporary status, but with some rights (still often abused). It's notable to me as a Canadian that I don't see this being advocated on any large scale by either party in the US.
Anyways, all this just to say that the jackboot clown theater is the point, not a side effect.
Surely you know that this is an extreme misrepresentation? There are >35 million legal immigrants in the US. It's far from "likely" that as one of them you're abducted and sent to a camp.
I think it would be a useful exercise to look at all the revocations of legal access in the us, and then do the division to see how we've increased the likelihood of becoming an illegal, and therefore targeted.
I dont think youre as right as you want to believe. Certainly not as right as I want you to believe
Unfortunately, the most extreme is that it's the new normal that now, there's >0 chance that someone, whether they are a US citizen or not apparently, child or adult, can end up in a camp, with no due process.
Anthropic has gone out of their way to make a point about how much they love and admire the US state and its defense sector. Only drawing the line at a very far point and even when they drew the line it was with a big thing about how they believe in the American defense sector blah blah blah.
In any case, there's no way Anthropic's investors in Silicon Valley would countenance such a move.
Also, I'm biased the logical place is Canada, not Europe. Much of the fundamental/foundational research on LLMs, and a large part of the talent, came from universities in Canada anyways.
China is also giving them dump trucks full of cash though. Plus you have to content with the nationalism reason (unfortunately this has died off in America for too many). The idea of building your country is valued for most Chinese I have met. Plus China is incredibly nice to live in, especially if you have lots of money and/or connections. So you can work in China, get paid lots of money, feel like you are doing good. Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
China city life is amazingly convenient. Trains and subways are just such an enormous quality of life boost. Add to that the relative cleanliness of having nearly zero homelessness and you’ve got something very compelling.
I will say we are winning in accessibility. China doesn’t have much of a ramp game
I wonder if you max out your options in China. It seems the Party is suspicious of ambition and high profile winners. I'm sure you can live comfortably, but there's a ceiling.
I got an offer out of the blue for a consulting gig in ML, offering USD 400/hr in China. Assuming this was legit (the offeror seemed legit), it looks like China is also throwing a lot of Benjamins around...
I'm sure it's a very nice place to live if you're content to just stay quiet in society and never put a political sign in your yard or even just talk about the wrong thing with your friend in a WeChat.
They probably have tried, but you have to have more cash than those researchers feel they can get starting their own lab. When you consider the fact that their new startup lab would have the entire nation of China as, in effect, a captive market; you start to see how almost any amount of money would be too little to convince them not to make a run at that new startup. If money is their aim.
I think Alibaba needs to just give these guys a blank check. Let them fill it in themselves. Absent that, I'm pretty sure they'll make their own startup.
I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
There has been tension between Qwen's research team and Alibaba's product team, say the Qwen App. And recently, Alibaba tried to impose DAU as a KPI. It's understandable that a company like Alibaba would force a change of product strategy for any number of reasons. What puzzled me is why they would push out the key members of their research team. Didn't the industry have a shortage of model researchers and builders?
Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?
I would second that Qwen3.5 is exceptionally good. In a calibration, it (35b variant) was running locally with Ada NextGen 24GB to do the same things with easy-llm-cli in comparison with gemini-cli + Gemini 3 Pro, they were at par … really impressive it ran pretty fast …
I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.
Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.
I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.
They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.
Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.
More likely some high ranking party member's nepobaby from Gemini sniffed success with Qwen and the original folks just walked away as their reward disappeared.
apples v.s. oranges. The later is true, Emad did get sabotaged (for not being able to raise money in time, about 8-month before he's leaving). Junyang didn't have that long arc of incidents.
I use Qwen 3 Coder Next daily on my mac as my main coding agent. It is incredibly capable and its strange how you are painting this picture as if its a fringe use case, there are whole communities that have popped up around running local models.
Can I doubt your claim? I have had such terrible luck with AI coding on <400B models. Not to mention, I imagine your codebase is tiny. Or you are working for some company that isnt keeping track of your productivity.
I am trying super hard to use cheap models, and outside SOTA models, they have been more trouble than they are worth.
Yesterday, I got Qwen-Coder-Next to build a python script that reads a Postman collection, pulls the data from it to build a request to one of the endpoints, download a specific group of files whose URLs were buried in the JSON payload in that endpoint, then transform then all to a specific size of PNG, all without breaking a sweat. I didn't even have to tell it to use Pillow, but it did everything to a T.
Use case means everything. I doubt this model would fare well on a large codebase, but this thing is incredible.
I managed to get qwen2.5-coder:14B working under ollama on an Nvidia 2080 Ti with 11GB of VRAM, using ollama cli, outputting what looks like 200 words-per-minute to my eye
It has been useful for education ("What does this Elixir code do? <Paste file> ..... <general explanation> "then What this line mean?")
as well as getting a few basic tests written when I'm unfamiliar with the syntax. ("In Elixir Phoenix, given <subject under test, paste entire module file> and <test helper module, paste entire file> and <existing tests, pasted in, used both for context and as examples> , what is one additional test you would write?")
This is useful in that I get a single test I can review, run, paste in, and I'm not using any quota. Generally I have to fix it, but that's just a matter of reading the actual test and throwing the test failure output to the LLM to propose a fix. Some human judgement is required but once I got going adding a test took 10 minutes despite being relatively unfamiliar with Elixir Phoenix .
It's a nice loop, I'm in the loop, and I'm learning Elixir and contributing a useful feature that has tests.
Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
When you say you use local model in OpenCode, do you mean through the ollama backend? Last time I tried it with various models, I got issues where the model was calling tools in the wrong format.
I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
Kimi K2.5 is pretty good, you can use it on OpenRouter. Fireworks is a good provider, they were giving free access to the model on OpenCode when it first released.
I think this is directing coders towards self-sufficiency and that's a good thing. If they don't end up using it for agentic coding, they can use it for running tests, builds, non-agentic voice controlled coding, video creation, running kubernetes, or agent orchestration. So no, it's not evil, even if it doesn't go quite as expected.
I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.
I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.
I've been playing with 3.5:122b on a GH200 the past few days for rust/react/ts, and while it's clearly sub-Sonnet, with tight descriptions it can get small-medium tasks done OK - as well as Sonnet if the scope is small.
The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.
[delayed]
I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.
It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.
That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.
> it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself.
I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.
The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.
I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.
Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.
Are you running it locally with llama.cpp? If so, is it working without any tweaking of the chat template? The tool calls fail for me when using the default chat template, however it seems to work a whole lot better with this: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/discussions/9#69...
Have you tried the '--jinja' flag in llama-server?
What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.
> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.
what's your take between Qwen3.5-35B-A3B and Qwen3-Coder-Next?
In my experience Qwen3.5 is better even at smaller distillations. From what I understand the Qwen3-next series of models was just a test/preview of the architectural changes underpinning Qwen3.5. So Qwen3.5 is a more complete and well trained version of those models.
We don't have a Qwen3.5-Coder to compare with, but there is a chart comparing Qwen3.5 to Qwen3 including Qwen3-Next[0].
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rivckt/visuali...
In my experience qwen 3 coder next is better. I ran quite a few tests yesterday and it was much better at utilizing tool calls properly and understanding complex code. For its size though 3.5 35B was very impressive. coder next is an 80b model so i think its just a size thing - also for whatever reason coder next is faster on my machine. Only model that is competitive in speed is GLM 4.7 flash
What do you use as the orchestrator? By this I mean opencode, or the like. Is that the right term?
I use the term "harness" for those - or just "coding agent". I think orchestrator is more appropriate for systems that try to coordinate multiple agents running at the same time.
This terminology is still very much undefined though, so my version may not be the winning definition.
What is the meaning of 'A3B'?
It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.
Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.
But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.
This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.
I wonder how a US lab hasn't dumped truckloads of cash into various laps to ensure these researchers have a place at their lab
ICE has been detaining Chinese people in my area (and going door to door in at least one neighborhood where a lot of Chinese and Indians live). I was hearing about this just last week as word spread amongst the Chinese community here (Ohio) to make sure you have some legal documentation beyond just your driver's license on you at all times for protection. People will hear about this through the grapevine and it has a massive (and rightly so) chilling effect. US labs can try but with US government behaving like it is I don't think they will have much luck.
Yeah, the Hyundai factory fiasco kind of dashed the idea that the enforcement would spare people working in favored industries setting up in the US.
Are the people being detained in the country illegally?
Sometimes, often times no. They have detained multiple US citizens.
Yes. Yes, so true. And the phd types building these models are probably even scared in China that ICE will fly there to deport them.
This thread is about bringing these people to the US.
What the US has done is dumped truckloads of cash to make it likely that as a legal immigrant you will be abducted and sent to a camp.
This is FUD. The US has dumped truckloads of cash to make it likely that masked men with no cameras and little training will parade around abducting anyone they even suspect of being an illegal immigrant, after even Yale admitted it's likely that more than 22M+ people came here illegally. https://insights.som.yale.edu/insights/yale-study-finds-twic...
It'd be good if Congress could do something to remove the masks, put cameras on these agents, and for the local governments to stop fighting removal of all people who are here illegally so we can pretend we have borders again.
I feel like we would disagree on the role of immigration in the US but I really appreciate you calling out how the current administration’s approach is only effective at making viral clips online. Meta comment, but it’s refreshing to talk with people who have different goals while still referencing a shared reality. Removing the masks and adding cameras shouldn’t be controversial unless your goal really is to make a paramilitary force for the president.
The unstated but obvious (to me?) goal of what ICE is doing is not to get large numbers of people out of the country, but to drive costs down for migrant labor by further disenfranchising them, making them scared, marginal, etc.
If they actually thoroughly evicted non-status migrant workers they'd have a outright revolt on their hands from farmers and other businesses that depend on them.
Instead those businesses can now take further advantage of the fear of harassment and/or deportation to drive down compensation and rights.
Contrast with countries like Canada that have a legal temporary foreign agriculture worker program that provides a regulated source of seasonal migrant farm worker labour under a non-citizen temporary status, but with some rights (still often abused). It's notable to me as a Canadian that I don't see this being advocated on any large scale by either party in the US.
Anyways, all this just to say that the jackboot clown theater is the point, not a side effect.
[delayed]
Surely you know that this is an extreme misrepresentation? There are >35 million legal immigrants in the US. It's far from "likely" that as one of them you're abducted and sent to a camp.
Give them time, they've only just started. They do waste a lot time abducting random US citizens though.
I think it would be a useful exercise to look at all the revocations of legal access in the us, and then do the division to see how we've increased the likelihood of becoming an illegal, and therefore targeted.
I dont think youre as right as you want to believe. Certainly not as right as I want you to believe
Unfortunately, the most extreme is that it's the new normal that now, there's >0 chance that someone, whether they are a US citizen or not apparently, child or adult, can end up in a camp, with no due process.
Indeed; or, Europe badly needs a competitive model to hedge against US political nonsense.
Offering „You are welcome“ relocation package to Anthropic might be a good idea.
Anthropic has gone out of their way to make a point about how much they love and admire the US state and its defense sector. Only drawing the line at a very far point and even when they drew the line it was with a big thing about how they believe in the American defense sector blah blah blah.
In any case, there's no way Anthropic's investors in Silicon Valley would countenance such a move.
Also, I'm biased the logical place is Canada, not Europe. Much of the fundamental/foundational research on LLMs, and a large part of the talent, came from universities in Canada anyways.
Given how American govt. has treated Anthropic, I think you might be right. EU truly has a remarkable opportunity to make Anthropic/Claude European.
Competitive models are illegal in the EU.
China is also giving them dump trucks full of cash though. Plus you have to content with the nationalism reason (unfortunately this has died off in America for too many). The idea of building your country is valued for most Chinese I have met. Plus China is incredibly nice to live in, especially if you have lots of money and/or connections. So you can work in China, get paid lots of money, feel like you are doing good. Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
China city life is amazingly convenient. Trains and subways are just such an enormous quality of life boost. Add to that the relative cleanliness of having nearly zero homelessness and you’ve got something very compelling.
I will say we are winning in accessibility. China doesn’t have much of a ramp game
All very true.
I wonder if you max out your options in China. It seems the Party is suspicious of ambition and high profile winners. I'm sure you can live comfortably, but there's a ceiling.
I got an offer out of the blue for a consulting gig in ML, offering USD 400/hr in China. Assuming this was legit (the offeror seemed legit), it looks like China is also throwing a lot of Benjamins around...
> China is incredibly nice to live in
I'm sure it's a very nice place to live if you're content to just stay quiet in society and never put a political sign in your yard or even just talk about the wrong thing with your friend in a WeChat.
> Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
Isn't it just straight-up illegal in China to refuse the government from using your model? USA isn't perfect, but at least it has active discourse.
Damn that social conscience, huh?
They probably have tried, but you have to have more cash than those researchers feel they can get starting their own lab. When you consider the fact that their new startup lab would have the entire nation of China as, in effect, a captive market; you start to see how almost any amount of money would be too little to convince them not to make a run at that new startup. If money is their aim.
I think Alibaba needs to just give these guys a blank check. Let them fill it in themselves. Absent that, I'm pretty sure they'll make their own startup.
I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
> I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.
That's very likely to happen once the gap with OpenAI/Anthropic has been closed and they managed to pop the bubble.
I don’t know, the EV bubble deflated and Chinese firms are still pumping them out with subsidies like their life depends on it.
There has been tension between Qwen's research team and Alibaba's product team, say the Qwen App. And recently, Alibaba tried to impose DAU as a KPI. It's understandable that a company like Alibaba would force a change of product strategy for any number of reasons. What puzzled me is why they would push out the key members of their research team. Didn't the industry have a shortage of model researchers and builders?
Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?
Wild times!
Who is suggesting "AI is dead without (these specific) people"? People are wondering what it means specifically for the Qwen model family.
We've gone from AGI goals to short-term thinking via Ads. That puts things better in perspective, I think.
Claude is incapable of producing a native application for itself, and is bad enough with web ones to justify Anthropic acquiring Bun.
I'm hopeful they will pick up their work elsewhere and continue on this great fight for competitive open weight models.
To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.
I would second that Qwen3.5 is exceptionally good. In a calibration, it (35b variant) was running locally with Ada NextGen 24GB to do the same things with easy-llm-cli in comparison with gemini-cli + Gemini 3 Pro, they were at par … really impressive it ran pretty fast …
I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.
Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.
I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.
They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.
Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.
Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?
they're already there ?? https://openrouter.ai/qwen/qwen3.5-27b
There are smaller ones on HuggingFace https://huggingface.co/models?other=qwen3_5&sort=least_param... with 0.8B, 2B, 4B and 9B parameters.
Like 4B, 2B, 9B. Supposedly they are surprisingly smart.
More discussion:
https://news.ycombinator.com/item?id=47246746
My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai
More likely some high ranking party member's nepobaby from Gemini sniffed success with Qwen and the original folks just walked away as their reward disappeared.
apples v.s. oranges. The later is true, Emad did get sabotaged (for not being able to raise money in time, about 8-month before he's leaving). Junyang didn't have that long arc of incidents.
> me stepping down. bye my beloved qwen.
the qwen is dead, long live the qwen.
inb4 qwen is less of a supply chain risk than anthropic
Were they kneecapped by Anthropic blocking their distillation attempts?
>I’m hearing positive noises about the 27B and 35B models for coding tasks that still fit on a 32GB/64GB Mac
Isnt it interesting that you never see someone say "I used this on my Mac and it was useful"
Instead we get "you could put this on your Mac" or "I tried it, and it worked but it was too slow"
I feel like these people are performing an evil when they are making suggestions that cause a waste of money.
I use Qwen 3 Coder Next daily on my mac as my main coding agent. It is incredibly capable and its strange how you are painting this picture as if its a fringe use case, there are whole communities that have popped up around running local models.
Can I doubt your claim? I have had such terrible luck with AI coding on <400B models. Not to mention, I imagine your codebase is tiny. Or you are working for some company that isnt keeping track of your productivity.
I am trying super hard to use cheap models, and outside SOTA models, they have been more trouble than they are worth.
Yesterday, I got Qwen-Coder-Next to build a python script that reads a Postman collection, pulls the data from it to build a request to one of the endpoints, download a specific group of files whose URLs were buried in the JSON payload in that endpoint, then transform then all to a specific size of PNG, all without breaking a sweat. I didn't even have to tell it to use Pillow, but it did everything to a T.
Use case means everything. I doubt this model would fare well on a large codebase, but this thing is incredible.
I managed to get qwen2.5-coder:14B working under ollama on an Nvidia 2080 Ti with 11GB of VRAM, using ollama cli, outputting what looks like 200 words-per-minute to my eye
It has been useful for education ("What does this Elixir code do? <Paste file> ..... <general explanation> "then What this line mean?")
as well as getting a few basic tests written when I'm unfamiliar with the syntax. ("In Elixir Phoenix, given <subject under test, paste entire module file> and <test helper module, paste entire file> and <existing tests, pasted in, used both for context and as examples> , what is one additional test you would write?")
This is useful in that I get a single test I can review, run, paste in, and I'm not using any quota. Generally I have to fix it, but that's just a matter of reading the actual test and throwing the test failure output to the LLM to propose a fix. Some human judgement is required but once I got going adding a test took 10 minutes despite being relatively unfamiliar with Elixir Phoenix .
It's a nice loop, I'm in the loop, and I'm learning Elixir and contributing a useful feature that has tests.
The thing I'm most excited about is the moment that I run a model on my 64GB M2 that can usefully drive a coding agent harness.
Maybe Qwen3.5-35B-A3B is that model? This comment reports good results: https://news.ycombinator.com/item?id=47249343#47249782
I need to put that through its paces.
Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
I've not really settled on one yet. I've tried OpenCode and Codex CLI, but I know I should give Pi a proper go.
So far none of them have be useful enough at first glance with a local model for me to stick with them and dig in further.
When you say you use local model in OpenCode, do you mean through the ollama backend? Last time I tried it with various models, I got issues where the model was calling tools in the wrong format.
I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
Kimi K2.5 is pretty good, you can use it on OpenRouter. Fireworks is a good provider, they were giving free access to the model on OpenCode when it first released.
I think this is directing coders towards self-sufficiency and that's a good thing. If they don't end up using it for agentic coding, they can use it for running tests, builds, non-agentic voice controlled coding, video creation, running kubernetes, or agent orchestration. So no, it's not evil, even if it doesn't go quite as expected.