> We found that independent multi-agent systems (agents working in parallel without talking) amplified errors by 17.2x
The paper sounds too shallow. The errors data doesn't seem to have a rationale or correlation against the architecture. Specifically, what makes the SAS architecture to have lowest error rates while the similar architecture with independent agents having highest error rates? The conclusion doesn't seem well-grounded with reasoning.
I’ve been building a lot of agent workflows at my day job. Something that I’ve found a lot of success with when deciding on an orchestration strategy is to ask the agent what they recommend as part of the planning for phase. This technique of using the agent to help you improve its performance has been a game changer for me in leveraging this tech effectively. YMMV of course. I mostly use Claude code so who knows with the others.
This is a neat idea but there are so many variables here that it's hard to make generalizations.
Empirically, a top level orchestrator that calls out to a planning committee, then generates a task-dag from the plan which gets orchestrated in parallel where possible is the thing I've seen put in the best results in various heterogeneous environments. As models evolve, crosstalk may become less of a liability.
Reasoning is recursive - you cannot isolate where is should be symbolic and where it should be llm based (fuzzy/neural). This is the idea that started https://github.com/zby/llm-do - there is also RLM: https://alexzhang13.github.io/blog/2025/rlm/ RLM is simpler - but my approach also have some advantages.
I only agree with that statement if you're drawing from the set of all possible problems a priori. For any individual domain I think it's likely you can bound your analytic. This ties into the no free lunch theorem.
The underlying models are impressive, be it Gemini (via direct API calls, vs the app or search), I would include alpha-go/fold/etc in that classification
The products they build, where the agentic stuff is, is what I find unimpressive. The quality is low, the UX is bad, they are forced into every product. Two notable examples, search in GCloud, gemini-cli, antigravity (not theirs technically, $2B whitelabel deal with windsurf iirc)
So yes, I see it as perfectly acceptable to be more skeptical of Google's take on agentic systems when I find their real world applications lackluster
I agree with you in general re "agentic systems". Though they might deliberately not be trying to compete in the "agent harness" space yet.
The antigravity experiment yes was via windsurf - probably nobody expected that to take off but maybe was work that made have surfaced some lessons worth learning from.
My hunch is that Google is past it's prime, all the good PMs are gone, and now it looks like a chicken hydra with all the heads off and trying to run in multiple directs.
There is no clear vision, coherence, or confidence that the products will be around in a another year
Kind of a weird take given they are one of the strongest AI providers who are the most vertically integrated. Sure, maybe the company isn’t as healthy as it once was, but none of them are - late stage capitalism is rotting most foundations
Their poor product decisions have driven me away, that doesn't mean I'm still very impressed with everything under that. I'm building my custom agent on their open source Agent Development Kit and the Gemini family.
> We found that independent multi-agent systems (agents working in parallel without talking) amplified errors by 17.2x
The paper sounds too shallow. The errors data doesn't seem to have a rationale or correlation against the architecture. Specifically, what makes the SAS architecture to have lowest error rates while the similar architecture with independent agents having highest error rates? The conclusion doesn't seem well-grounded with reasoning.
I’ve been building a lot of agent workflows at my day job. Something that I’ve found a lot of success with when deciding on an orchestration strategy is to ask the agent what they recommend as part of the planning for phase. This technique of using the agent to help you improve its performance has been a game changer for me in leveraging this tech effectively. YMMV of course. I mostly use Claude code so who knows with the others.
This is a neat idea but there are so many variables here that it's hard to make generalizations.
Empirically, a top level orchestrator that calls out to a planning committee, then generates a task-dag from the plan which gets orchestrated in parallel where possible is the thing I've seen put in the best results in various heterogeneous environments. As models evolve, crosstalk may become less of a liability.
Reasoning is recursive - you cannot isolate where is should be symbolic and where it should be llm based (fuzzy/neural). This is the idea that started https://github.com/zby/llm-do - there is also RLM: https://alexzhang13.github.io/blog/2025/rlm/ RLM is simpler - but my approach also have some advantages.
I only agree with that statement if you're drawing from the set of all possible problems a priori. For any individual domain I think it's likely you can bound your analytic. This ties into the no free lunch theorem.
It's crazy how 30 years back 2 grad students lucked out with an algorithm and now we have to listen to all the thought bros in this company.
gonna read this with a grain of salt because I have been rather unimpressed with Google's Ai products, save direct API calls to gemini
The rest is trash they are forcing down our throats
Yeah alpha go and zero were lame. The earth foundation model - that's just ridiculous.
That's sarcasm
---
Your "direct Gemini calls" is maybe the least impressive
edit: This paper is mostly a sort of "quantitative survey". Nothing to get too excited about requiring a grain of salt
The underlying models are impressive, be it Gemini (via direct API calls, vs the app or search), I would include alpha-go/fold/etc in that classification
The products they build, where the agentic stuff is, is what I find unimpressive. The quality is low, the UX is bad, they are forced into every product. Two notable examples, search in GCloud, gemini-cli, antigravity (not theirs technically, $2B whitelabel deal with windsurf iirc)
So yes, I see it as perfectly acceptable to be more skeptical of Google's take on agentic systems when I find their real world applications lackluster
I agree with you in general re "agentic systems". Though they might deliberately not be trying to compete in the "agent harness" space yet.
The antigravity experiment yes was via windsurf - probably nobody expected that to take off but maybe was work that made have surfaced some lessons worth learning from.
My hunch is that Google is past it's prime, all the good PMs are gone, and now it looks like a chicken hydra with all the heads off and trying to run in multiple directs.
There is no clear vision, coherence, or confidence that the products will be around in a another year
Kind of a weird take given they are one of the strongest AI providers who are the most vertically integrated. Sure, maybe the company isn’t as healthy as it once was, but none of them are - late stage capitalism is rotting most foundations
I saying this as a big, but dimming, Google-stan
Their poor product decisions have driven me away, that doesn't mean I'm still very impressed with everything under that. I'm building my custom agent on their open source Agent Development Kit and the Gemini family.