> This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.
So this is the same policy that Anthropic and OpenAI have, it is just based on your criteria rather than theirs.
I think the policy universally makes sense, who would want to give a tool like this to bad actors? But it does leave a big section of the market underserved. Particularly when Mythos was made accessible to very large orgs and then Fable was pulled on export grounds.
The problem is that it is a fool's errand to try to keep software tools from 'bad actors'. It is as pointless now as it was during the Crypto Wars. Information is simply too easy to move.
It's really absurd to think any of these models can be protected _by commercial interests_. They couldn't keep from hiring north koreans anymore than they'll stop bad actors from operationalizing these models.
A lot of bad actors are both technically sophisticated and have more than enough resources to post train their model. Morally I think it's still the right choice, but consequence wise I doubt it's going to make a big difference.
What was your approach to benchmarking an adversarial agent?
This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.
Would be really interested if you can share your eval approach :)
No, you can’t. This page is a sales funnel to schedule a 30 minute video chat with Cosine.ai or argusred or whatever. The thing you can test is not the thing that the headline is talking about.
It’s just more “We’re so smart we invented the boogeyman, trust us” slop marketing that’s been happening since gpt-2
> Gated because the security implications are real; access is via booking
If I wanted to show off a “model that pen tests” I’d at least include a gif of it running against Juice Shop or something before the spooky language and “schedule a sales call”
At my job we have tooling that scans our code repos with Opus. Yes it can find stuff however it doesn’t find everything.
I am able to get Opus and Sonnet to function as a red team agent. We don’t have some crazy special sauce, just a lot of trial and error. Basically add enough context proving we own the code and running services that it will run attempts to compromise our services.
It found tons of stuff that was not found with just scanning the code. It found serious security issues that had been in productions for years that humans never found. They weren’t things that were accessible externally but serious enough that we are thrilled to have these tools.
I can say that Fable did refuse to function with our harness. I am worried that soon you have to be in the special club to do this stuff with the SOTA models. A small company like ours doesn’t get accepted to their programs that remove guardrails. Even though our CEO has found and disclosed vulnerabilities to multiple companies and holds a patent around federated authentication.
They are only protecting corporate interests in insecure code bases by doing this. If everyone could have Mythos in their pockets, all the poorly written bottom dollar rush developed software would be rightfully shown to be the trash it always was. It would spur engineering liability legislation for commercial software and operations: speed-release poor insecure code --> corporate bankruptcy and maybe even prison for the software PE who signed off on it. Software, infrastructure, and hardware security won't improve massively until the "bad actors" start running rampant on the steaming pile!
You need both, scanning for your own code, pen testing to actually prove vulnerabilities, otherwise it can be very noisy and one of the things that most tools currently suffer from is they give you too many false positives.
For the moment. The pen testing we gated it for now until we resolve the debate of safety.
> This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.
So this is the same policy that Anthropic and OpenAI have, it is just based on your criteria rather than theirs.
I think the policy universally makes sense, who would want to give a tool like this to bad actors? But it does leave a big section of the market underserved. Particularly when Mythos was made accessible to very large orgs and then Fable was pulled on export grounds.
Do you think bad actors can't make something like this? What are you even talking about?
The problem is that it is a fool's errand to try to keep software tools from 'bad actors'. It is as pointless now as it was during the Crypto Wars. Information is simply too easy to move.
https://en.wikipedia.org/wiki/Crypto_Wars
The policy is repugnant. Whoever delivers the first frontier model as open weights to the world which lacks these moral guardrails will win.
Stop thinking you know morals better than your users, or get out of the way so a competitor who respects your users more can serve them!
It's really absurd to think any of these models can be protected _by commercial interests_. They couldn't keep from hiring north koreans anymore than they'll stop bad actors from operationalizing these models.
A lot of bad actors are both technically sophisticated and have more than enough resources to post train their model. Morally I think it's still the right choice, but consequence wise I doubt it's going to make a big difference.
I actually wonder how valuable this verbiage is
To me it looks like copycat marketing more than a strongly held stance
Artificial scarcity, membership club criteria to make members feel special
Perhaps there is an organization that awards this “responsibility” behavior, the EU comes to mind but not lucrative enough
As far as engagement farming goes, it got us to engage and boost its reach, for something we might otherwise ignore with more benign language
Once I get the answers I will execute
As soon as I read that I literally scoffed. Doublethink at its finest. Doubleplusungood.
Fantastic. Could you share more details what it was like post-training a model?
What was your approach to benchmarking an adversarial agent?
This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.
Would be really interested if you can share your eval approach :)
Show HN: We told Claude to generate a marketing page for a theoretical pentesting model
The tool is live, you can test it.
No, you can’t. This page is a sales funnel to schedule a 30 minute video chat with Cosine.ai or argusred or whatever. The thing you can test is not the thing that the headline is talking about.
It’s just more “We’re so smart we invented the boogeyman, trust us” slop marketing that’s been happening since gpt-2
Did you follow the link? There is a brew install binary you can install and test. It's live.
> Gated because the security implications are real; access is via booking
If I wanted to show off a “model that pen tests” I’d at least include a gif of it running against Juice Shop or something before the spooky language and “schedule a sales call”
Why create an offensive tool rather than a repo-scanning tool?
I can't think of any way to safely release an offensive tool publicly.
At my job we have tooling that scans our code repos with Opus. Yes it can find stuff however it doesn’t find everything.
I am able to get Opus and Sonnet to function as a red team agent. We don’t have some crazy special sauce, just a lot of trial and error. Basically add enough context proving we own the code and running services that it will run attempts to compromise our services.
It found tons of stuff that was not found with just scanning the code. It found serious security issues that had been in productions for years that humans never found. They weren’t things that were accessible externally but serious enough that we are thrilled to have these tools.
I can say that Fable did refuse to function with our harness. I am worried that soon you have to be in the special club to do this stuff with the SOTA models. A small company like ours doesn’t get accepted to their programs that remove guardrails. Even though our CEO has found and disclosed vulnerabilities to multiple companies and holds a patent around federated authentication.
They are only protecting corporate interests in insecure code bases by doing this. If everyone could have Mythos in their pockets, all the poorly written bottom dollar rush developed software would be rightfully shown to be the trash it always was. It would spur engineering liability legislation for commercial software and operations: speed-release poor insecure code --> corporate bankruptcy and maybe even prison for the software PE who signed off on it. Software, infrastructure, and hardware security won't improve massively until the "bad actors" start running rampant on the steaming pile!
You need both, scanning for your own code, pen testing to actually prove vulnerabilities, otherwise it can be very noisy and one of the things that most tools currently suffer from is they give you too many false positives. For the moment. The pen testing we gated it for now until we resolve the debate of safety.