In case the topic of memory safety is interesting to anyone I've been experimenting with using AI agents to port common web infra projects to safe/ performant Rust. Somewhat inspired by the Bun port - was thinking that at some point memory safety might be such a big deal that people just need drop in replacements.
- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented)
- Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main
- I have a less developed nginx version that would be the north star
- These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
Respectfully, as an OSS maintainer (not to the scale of nginx or valkey, of course)... if a third-party used an AI agent to rewrite my software in a different language, that gives me absolutely no reason to support that new project.
It is in all respects foreign code in a language I may or may not be familiar with, and worse yet, if I were to take over, I'd be responsible for maintaining the whole black box forever more?
I love Rust, but porting others software to Rust (or any language) is a mixed bag. I'm a strong believer that good software requires deep domain knowledge to build and maintain. Porting code you don't understand by hand already risks still not understanding it afterwards, doing it in an automated fashion all but guarantees it.
All that to say I think these automated ports are interesting experiments. However if you want to build something people can trust, the people need to be able to trust that you fully understand what is built, and why it's built the way it is.
It’s clear that Anthropic has run out of the compute capacity needed to serve Mythos publicly.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
I had to patch my Linux boxes daily at some point in the past couple months. I don’t want Mythos to be publicly released for as long as it is economically feasible for Anthropic. I hope they have a gentleman’s agreement with OpenAI and DeepMind about this, too.
Chinese labs will force their hands, until then let’s hope maximum number of projects get patched at a reasonable pace.
It is not "clear", as your comment suggests, it's hidden. Which is semantically the opposite of clear. Regarding your theory, might be true, might be false. But it's highly speculative.
The security concerns argument would have worked better if a forum full of people hadn't promptly obtained access by the extremely sophisticated tactic of guessing its URL...
Probably. This is an 8-12 trillion-parameter model, which is why it costs so much, that is also a major reason, besides RL and synthetic data, why it suddenly gained these new capabilities. They claim it was not fine-tuned or trained specifically for cybersecurity, but is instead a general purpose model.
Here's my big fear: Even IF (and that's a BIG if) we get all critical vulnerabilities fixed in tech (before adversarial/state-actors turn up with open attack models) - we still have (in at least a year) models that will be so good in social engineering that they can still (given enough tokens) gain access to whatever system they want.
If society can't trust banks and other institutions to safely control their data, what follows ?
Social engineering as a problem goes away when anybody can get a model to do it for them for $5. It stops being possible, it's really the bank's problem when they can't have a minimum wage call center or a robot responsible for people's data.
The government should be in charge of ID Provider infrastructure and has local offices (postal) that can establish physical identity (and already do for people who need to travel abroad), but the religiously affiliated NWO conspiracy theorists have made this politically infeasible in the US, so we have unsavory private sector providers like World ID stepping in.
GPT-5.5-Cyber has already at least hit if not surpassed Mythos capability in cyber tasks. The only reason they're holding back is because once its out everyone would realize that its capabilities were a step change in March, but are not anymore, yet it costs significantly more and is much slower.
i think anthropic is being performative here, creating a hype for mythos and not releasing. i guess this is all a marketing thing to sell a security specialized AI to enterprise and startups at a way larger cost coz security market is deep in money.
To me this makes little sense — I can’t imagine the orgs they have limited this rollout to don’t already have Claude subscriptions and integrations. And sure this may play nicely into branding a build a mystique around the model but ultimately they are missing out on a ton of revenue and risking being totally front-run now that model performance parameters are out and people have firsthand experience. Feels more like a fairly genuine attempt to be responsible. They could have easily rolled out an update and done some PR to absolve themselves of responsibility
Is there any evidence Mythos is qualitatively better than the Opus 4.x?
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
From what I've read so far it's less about Mythos being much better at tasks in isolation.
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
No single open weights model comes close to either Mythos or GPT 5.5.
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
They keep writing like they stand to profit from this or something. Too many “coulds” in there for me too, this could be an amazing advancement and it could be nothing… normally we look at data and last headline I saw was 25 “high” vulnerabilities at the cost of $1 million in tokens.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
It’s true that providing security services to so many organizations will likely put them in a position to earn lots of money. It makes them an essential service, sort of like what happened with Cloudflare and denial-of-service attacks. (There are competitors, but they’re the first company people think of.)
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
With trillion dollars at stake they can hire best of best in sales and marketing. And unlike some hardcore hackers who may have ethics that does not always move in direction of more money. Sales and marketing people are highly motivated for opportunities to make more money.
These companies are surely already onboarded…? They claim like 10k verified and high severity CVEs. Would you have preferred they just rolled it out like another opus update? You wouldn’t be insinuating in that situation that they were careless and reckless? They risk missing a boatload of revenue if openAI front runs them for a public launch. In what world is this some sort of scam??
I assume they're using a more candid definition where they're not counting all the countries a company may be based, but rather the primary country they're based in.
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
They're writing it in contrast to the previous scope, which doesn't seem to have been available to any organizations based outside the US. (There was news a few weeks ago about how Japanese banks were going to gain access, but based on the timing I think this announcement is that access.)
Maybe it is just me: I feel Anthropic most recent product announcements resemble more and more like what IBM tactic was at its high. For instance, the Watson AI hype after it defeated Kasparov. The difference is IBM actually wanted and let businesses buy and use Watson as opposed to time released like what Anthropic does to even boost the hype higher.
In case the topic of memory safety is interesting to anyone I've been experimenting with using AI agents to port common web infra projects to safe/ performant Rust. Somewhat inspired by the Bun port - was thinking that at some point memory safety might be such a big deal that people just need drop in replacements.
- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented) - Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main - I have a less developed nginx version that would be the north star - These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
Respectfully, as an OSS maintainer (not to the scale of nginx or valkey, of course)... if a third-party used an AI agent to rewrite my software in a different language, that gives me absolutely no reason to support that new project.
It is in all respects foreign code in a language I may or may not be familiar with, and worse yet, if I were to take over, I'd be responsible for maintaining the whole black box forever more?
Thank you but no thanks.
I love Rust, but porting others software to Rust (or any language) is a mixed bag. I'm a strong believer that good software requires deep domain knowledge to build and maintain. Porting code you don't understand by hand already risks still not understanding it afterwards, doing it in an automated fashion all but guarantees it.
All that to say I think these automated ports are interesting experiments. However if you want to build something people can trust, the people need to be able to trust that you fully understand what is built, and why it's built the way it is.
If someone is porting such disparate projects as Valkey and Lua it is just for show and will be pre-alpha forever.
No one wants Bun in Rust, no one wants the rsync vibe code additions. This is just the only pro-AI comment, so the AI people voted it to the top.
It’s clear that Anthropic has run out of the compute capacity needed to serve Mythos publicly.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
I find this line of reasoning highly dubious.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
I had to patch my Linux boxes daily at some point in the past couple months. I don’t want Mythos to be publicly released for as long as it is economically feasible for Anthropic. I hope they have a gentleman’s agreement with OpenAI and DeepMind about this, too.
Chinese labs will force their hands, until then let’s hope maximum number of projects get patched at a reasonable pace.
It is not "clear", as your comment suggests, it's hidden. Which is semantically the opposite of clear. Regarding your theory, might be true, might be false. But it's highly speculative.
They started Glasswing before they struck that $1.25B/month deal with xAI/SpaceX for their (notoriously dirty) Memphis data centers.
So they have a whole lot more compute now than they did last month.
So why is OpenAI also releasing 5.5-Cyber in a private manner? Are they also out of compute?
I bet Huawei and co would be happy to sell them some cheapo chips for inference!
The security concerns argument would have worked better if a forum full of people hadn't promptly obtained access by the extremely sophisticated tactic of guessing its URL...
Also, they just want to jack up the price by creating sensation.
Probably. This is an 8-12 trillion-parameter model, which is why it costs so much, that is also a major reason, besides RL and synthetic data, why it suddenly gained these new capabilities. They claim it was not fine-tuned or trained specifically for cybersecurity, but is instead a general purpose model.
Here's my big fear: Even IF (and that's a BIG if) we get all critical vulnerabilities fixed in tech (before adversarial/state-actors turn up with open attack models) - we still have (in at least a year) models that will be so good in social engineering that they can still (given enough tokens) gain access to whatever system they want.
If society can't trust banks and other institutions to safely control their data, what follows ?
Do we we collectivelly switch off the internet?
Social engineering as a problem goes away when anybody can get a model to do it for them for $5. It stops being possible, it's really the bank's problem when they can't have a minimum wage call center or a robot responsible for people's data.
The government should be in charge of ID Provider infrastructure and has local offices (postal) that can establish physical identity (and already do for people who need to travel abroad), but the religiously affiliated NWO conspiracy theorists have made this politically infeasible in the US, so we have unsavory private sector providers like World ID stepping in.
GPT-5.5-Cyber has already at least hit if not surpassed Mythos capability in cyber tasks. The only reason they're holding back is because once its out everyone would realize that its capabilities were a step change in March, but are not anymore, yet it costs significantly more and is much slower.
how did you go about assessing this?
So you believe one marketing department more than the other?
i think anthropic is being performative here, creating a hype for mythos and not releasing. i guess this is all a marketing thing to sell a security specialized AI to enterprise and startups at a way larger cost coz security market is deep in money.
This is just cover for being sore that you don’t have access yet <- see what I did there?
People and organizations can have mixed motivations. It’s often not “just” one thing.
They should share it with Meta.
https://www.0xsid.com/blog/meta-account-takeover-fiasco
This feels more and more like a marketing/scarcity play for the largest global corps.
Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.
To me this makes little sense — I can’t imagine the orgs they have limited this rollout to don’t already have Claude subscriptions and integrations. And sure this may play nicely into branding a build a mystique around the model but ultimately they are missing out on a ton of revenue and risking being totally front-run now that model performance parameters are out and people have firsthand experience. Feels more like a fairly genuine attempt to be responsible. They could have easily rolled out an update and done some PR to absolve themselves of responsibility
Urgency x scarcity, unbeatable marketing move.
Is there any evidence Mythos is qualitatively better than the Opus 4.x?
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
From what I've read so far it's less about Mythos being much better at tasks in isolation.
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
I still find it funny that GPT-5.5 is just as good as Mythos and yet Anthropic likes to make things worse than they actually are.
How "altruistic" of them. If only Anthropic extended this level of care to the environment or the economy.
Whats currently an open source project which comes closest to Mythos capabilities?
No single open weights model comes close to either Mythos or GPT 5.5.
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
Anthropic has the marketing of a weight loss product.
- They still claim 10000 issues, but they found only one in curl.
- They did not find rsync issues but Claude rather introduced rsync issues.
- Facebook is a member of this cult program but Mythos did not find the account takeover flaw.
- Mythos did not find the issues in Anthropic's own Bun rewrite.
They will not release Mythos because it would be exposed as a fraud before the IPO.
They keep writing like they stand to profit from this or something. Too many “coulds” in there for me too, this could be an amazing advancement and it could be nothing… normally we look at data and last headline I saw was 25 “high” vulnerabilities at the cost of $1 million in tokens.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
Step 1: claim you created a tool so dangerous you can't release it
Step2: offer to test it, but only for the biggest companies in the world
Step 3: onboard those big players on your tooling and product
Step 4: profit
This is genius.
And all you have to do is demonstrate unique value during the pilot phase!
Err... wait... that was already the hard part... hmm
It’s true that providing security services to so many organizations will likely put them in a position to earn lots of money. It makes them an essential service, sort of like what happened with Cloudflare and denial-of-service attacks. (There are competitors, but they’re the first company people think of.)
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
With trillion dollars at stake they can hire best of best in sales and marketing. And unlike some hardcore hackers who may have ethics that does not always move in direction of more money. Sales and marketing people are highly motivated for opportunities to make more money.
These companies are surely already onboarded…? They claim like 10k verified and high severity CVEs. Would you have preferred they just rolled it out like another opus update? You wouldn’t be insinuating in that situation that they were careless and reckless? They risk missing a boatload of revenue if openAI front runs them for a public launch. In what world is this some sort of scam??
Seems like they're not even close to step 4.
And put Chris Olah, Anthropic co-founder, sitting next to Pope Leo XIV presenting his first encyclical, Magnifica Humanitas, at the Vatican.
>can't release it
can't release it the plebs
“Mythos Preview continues a long-term trend that we’ve been warning about for some time: within 6 to 12 months […]”
The only trend Mythos continues is Anthropic’s trend of warning that disaster is always 6 to 12 months away.
> The organizations in this new group are based in more than 15 countries
I mean most nasdaq tech companies would be in 13+ countries, why are they writing this like it's a big number, is hilariously small?
I assume they're using a more candid definition where they're not counting all the countries a company may be based, but rather the primary country they're based in.
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
They're writing it in contrast to the previous scope, which doesn't seem to have been available to any organizations based outside the US. (There was news a few weeks ago about how Japanese banks were going to gain access, but based on the timing I think this announcement is that access.)
That’s fine as long as I can identify and reject any Mythos derived patch as being irreproducible.
Why would it not be reproducible?
How can a patch be "reproducible"? The testcases are reproducible.
It would have been nice to have a list of the 150, but I guess it would make them a hacking target?
Expanding Project Glasswing (IPO)
Maybe it is just me: I feel Anthropic most recent product announcements resemble more and more like what IBM tactic was at its high. For instance, the Watson AI hype after it defeated Kasparov. The difference is IBM actually wanted and let businesses buy and use Watson as opposed to time released like what Anthropic does to even boost the hype higher.
Big Blue defeated Kasparov. The Watson hype was about winning Jeopardy, which is still kind of the only use case for current AI.