There's a few bits of information from the original sources that's left out:
- The METR paper surveyed just 16 developers to arrive at their conclusion. Not sure how that got past review. [0]
- The finding from the MIT report can also be viewed from a glass 5% full perspective:
> Just 5% of integrated AI pilots are extracting
millions in value.
> Winning startups build systems that learn from feedback (66% of executives want this),
retain context (63% demand this), and customize deeply to specific workflows. They start at
workflow edges with significant customization, then scale into core processes. [1]
People have a bias to want to believe something works in all cases, when it seemingly offers benefits to them. Especially when there’s a sunk investment involved.
I'm not saying AI is living up to the "hype" or "expectations" - it would largely depend on how you quantify the hype or expectations. Most rational would be to consider how much money is funneled into vs how much ROI would it have within some time range in the future, e.g. 10 years. A wise investor would look ahead 10 years, balance benefits, potential and risks. By that metric it could be too early to say if it's paying off even if it's objectively clearly bringing 10x more expense than income.
But the metrics or facts without context or deeper explanations also don't mean much in that article.
> 95% of AI pilots didn’t increase a company’s profit or productivity
If 5% do that could very well be enough to justify it, depending on for which reasons and after how much time the pilots are failing. It's widely touted that only 5% of start ups succeed, yet start ups overall have brought immense technological and productivity gains to the World. You could live in a hut and be happy, and argue none of it is needed, but none the less the gains by some metrics are here, despite 95% failing.
The article throws out numbers to make a point that it wanted to make, but fails to account for any nuance.
If there's a promising new tech, it makes sense that there will be many failed attempts to make use of it, and it makes sense a lot of money will be thrown in. If 5% succeed, it takes 1 million to do 1 attempt, but the potential is 1 billion if it succeeds, it's already 50x return.
In my personal experience, if used correctly it increases my own productivity a lot and I've been using AI daily ever since GPT 3.5 release. I would say I use it during most of what I do.
> AI Pullback Has Officially Started
So I'm personally not seeing this at all, based on how much I personally pay for AI, how much I use it, and how I see it iteratively improving, while it's already so useful for me.
We are building and seeing things that weren't realistic or feasible before now.
What if switching to 3 ply toilet paper (from the dreaded 1 ply) in employee bathrooms increased productivity in 5% of companies. We could also apply the same logic that these 5% could also produce 50x returns.
5% succeeding is abysmal for an industry where a trillion dollars or more is invested in.
And that’s ignoring the rampant copyright infringement, the exploding power use and accompanying increase in climate change, the harm it already does to people who are incapable of dealing with a sycophantic lying machine, the huge amounts of extremely low quality text and code and social media clips it produces. Oh and the further damage it is going to do to civil society because while we already struggled with fake news, this is turning the dial not to 11 but to 100.
There are a lot of people invested in AI, so they are cheerleaders. There are way more people who didn't invest, who are sour grapes and want to see it fail. I'm neither of these people, but it's a democracy after all. I think AI is due for another winter.
I think it's not as much a democracy as a market. We sometimes say people vote with their wallet on products, so I see where you come from.
Still, in this case I think a market analogy fits better.
There are people who want it and people who don't want it. If the people with a lot of money (to manage for companies) want it, this will move the balance. If it eventually moves it enough remains to be seen.
Decisions can be made with too much excitement and based on overpromises, but eventually someone will draw a bottom line under (generative) AI, the one where currently the huge amount of money gets pumped into. Either will generate generate value that people pay for and the investors make a profit or not. Bubbles and misconceptions can extend the time when the line is drawn, but eventually it will be.
If LLM and generative is generally creates value, or not, I cannot say.
I am sure that the more specialised AI solutions that are better described as machine learning does create this value in their special use cases and will stay.
I’ve been using AI coding systems for quite some time, and have worked in neural networks since the 90’s. The advancements are, frankly, almost as crazy as 90’s neural net devotees like me were claiming could be possible in the eventual future.
That said, the non-tech-executive/product-management take on AI has often been an utter failure to recognize key differences between problems and systems. I spend an inordinate amount of time framing questions in terms of promises to customers, completeness, reproducibility, and contextual complexity.
However, for someone in my role, building and ideating in innovation programs, the power of LLM assisted coding is hard to pass up. It may only get things 50% of the way there before collapsing into a spiral of sloppy overwrought code, but we often only need 30-40% fidelity to exercise an idea. Ideation is a great space for vibe coding. However, one enormous risk in these approaches is in overpromising the undeliverable. If folks don’t keep a sharp eye on the nature of the promises they’re making, they may be in for a pretty wild ride; with the last “20%” of the program taking more than 90% of the calendar time due to compression of the first “80%” and complication of the remainder.
We’re going to need to adjust. These tools are here to stay, but they’re far from taking over the whole show.
We should expect pullbacks, fuckups, plans failing, and rollouts getting canned. It's part of how humans do things. Its actually a pretty effective optimization algorithm.
I'd bet that some sort of exponentiate the learning rate until shit goes haywire then rollback the weights is actually probably a fairly decent algorithm (something like backtracking line search).
AI (LLM) is useful for coding and I use it to lookup various articles or websites and summarize.
Use it where it works.. ignore the agents hype and other bullshit peddled by 19yo dropouts.
Unlike the 19yo dropouts of the 2010s these guys have brain rot and I don’t trust them after having talked to such people at start up events and getting their black pill takes. They have products that don’t work and lie about numbers.
I’ll trust people like Karpathy and others who are genuinely smart af and not kumon products.
On this website on another thread there is a principal software engineer at Microsoft who wrote an essay on how agent systems are amplifying all of the employees productivity massively even on large complex tasks.
Now the question is whether that’s true (and thus should be objectively measurable) or if he is bullshitting because Microsoft invested so much money in it it just has to work.
People roll out a complex and powerful technology without understanding the technology fully, what evals are or updating process to account for the tech, and the rollout fails, news at 11.
Seriously though, "AI fucks up" is a known thing (as is humans fuck up!) and the people who are using the tech successfully account for that and build guardrails into their systems. Use version control, build automated tests (e2e/stress, not just unit), update your process so you're not incentivizing dumb shit like employees dumping unchecked AI prs, etc.
If the tech only worked for coding it would be one thing. But it’s advertised as a cure for anything and everything and so people are using it for that. And you can’t build automated tests for that.
I am a big AI booster but I agree that using agents for tasks unsupervised without either rigorous oversight or strong automated constraints is a mistake.
There's a few bits of information from the original sources that's left out:
- The METR paper surveyed just 16 developers to arrive at their conclusion. Not sure how that got past review. [0]
- The finding from the MIT report can also be viewed from a glass 5% full perspective:
> Just 5% of integrated AI pilots are extracting millions in value. > Winning startups build systems that learn from feedback (66% of executives want this), retain context (63% demand this), and customize deeply to specific workflows. They start at workflow edges with significant customization, then scale into core processes. [1]
[0] https://arxiv.org/abs/2507.09089
[1] https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Bus...
AI creates the most spectacular happy path demos. It’s hard not to extrapolate to infinity when you see it.
People have a bias to want to believe something works in all cases, when it seemingly offers benefits to them. Especially when there’s a sunk investment involved.
>Back in 2024, 54% of researchers used AI — that figure jumped up to 84% this year
kind of makes me doubt the pullback. Maybe the hype's dying but it's getting along as an everyday tool?
I'm not saying AI is living up to the "hype" or "expectations" - it would largely depend on how you quantify the hype or expectations. Most rational would be to consider how much money is funneled into vs how much ROI would it have within some time range in the future, e.g. 10 years. A wise investor would look ahead 10 years, balance benefits, potential and risks. By that metric it could be too early to say if it's paying off even if it's objectively clearly bringing 10x more expense than income.
But the metrics or facts without context or deeper explanations also don't mean much in that article.
> 95% of AI pilots didn’t increase a company’s profit or productivity
If 5% do that could very well be enough to justify it, depending on for which reasons and after how much time the pilots are failing. It's widely touted that only 5% of start ups succeed, yet start ups overall have brought immense technological and productivity gains to the World. You could live in a hut and be happy, and argue none of it is needed, but none the less the gains by some metrics are here, despite 95% failing.
The article throws out numbers to make a point that it wanted to make, but fails to account for any nuance.
If there's a promising new tech, it makes sense that there will be many failed attempts to make use of it, and it makes sense a lot of money will be thrown in. If 5% succeed, it takes 1 million to do 1 attempt, but the potential is 1 billion if it succeeds, it's already 50x return.
In my personal experience, if used correctly it increases my own productivity a lot and I've been using AI daily ever since GPT 3.5 release. I would say I use it during most of what I do.
> AI Pullback Has Officially Started
So I'm personally not seeing this at all, based on how much I personally pay for AI, how much I use it, and how I see it iteratively improving, while it's already so useful for me.
We are building and seeing things that weren't realistic or feasible before now.
What if switching to 3 ply toilet paper (from the dreaded 1 ply) in employee bathrooms increased productivity in 5% of companies. We could also apply the same logic that these 5% could also produce 50x returns.
I'm not sure what your point is, but I do absolutely think that getting proper toilet paper offers 50x returns and everyone should prioritize it.
Time is ticking and running out. You do realise future numbers have to be discounted heavily for the required rate of return given the risk, right?
This means the free cash flow to the firm OAI generates will have to be huge, given the negative cash outflows to date.
5% succeeding is abysmal for an industry where a trillion dollars or more is invested in.
And that’s ignoring the rampant copyright infringement, the exploding power use and accompanying increase in climate change, the harm it already does to people who are incapable of dealing with a sycophantic lying machine, the huge amounts of extremely low quality text and code and social media clips it produces. Oh and the further damage it is going to do to civil society because while we already struggled with fake news, this is turning the dial not to 11 but to 100.
There are a lot of people invested in AI, so they are cheerleaders. There are way more people who didn't invest, who are sour grapes and want to see it fail. I'm neither of these people, but it's a democracy after all. I think AI is due for another winter.
I think it's not as much a democracy as a market. We sometimes say people vote with their wallet on products, so I see where you come from.
Still, in this case I think a market analogy fits better. There are people who want it and people who don't want it. If the people with a lot of money (to manage for companies) want it, this will move the balance. If it eventually moves it enough remains to be seen. Decisions can be made with too much excitement and based on overpromises, but eventually someone will draw a bottom line under (generative) AI, the one where currently the huge amount of money gets pumped into. Either will generate generate value that people pay for and the investors make a profit or not. Bubbles and misconceptions can extend the time when the line is drawn, but eventually it will be.
If LLM and generative is generally creates value, or not, I cannot say. I am sure that the more specialised AI solutions that are better described as machine learning does create this value in their special use cases and will stay.
I’ve been using AI coding systems for quite some time, and have worked in neural networks since the 90’s. The advancements are, frankly, almost as crazy as 90’s neural net devotees like me were claiming could be possible in the eventual future.
That said, the non-tech-executive/product-management take on AI has often been an utter failure to recognize key differences between problems and systems. I spend an inordinate amount of time framing questions in terms of promises to customers, completeness, reproducibility, and contextual complexity.
However, for someone in my role, building and ideating in innovation programs, the power of LLM assisted coding is hard to pass up. It may only get things 50% of the way there before collapsing into a spiral of sloppy overwrought code, but we often only need 30-40% fidelity to exercise an idea. Ideation is a great space for vibe coding. However, one enormous risk in these approaches is in overpromising the undeliverable. If folks don’t keep a sharp eye on the nature of the promises they’re making, they may be in for a pretty wild ride; with the last “20%” of the program taking more than 90% of the calendar time due to compression of the first “80%” and complication of the remainder.
We’re going to need to adjust. These tools are here to stay, but they’re far from taking over the whole show.
We should expect pullbacks, fuckups, plans failing, and rollouts getting canned. It's part of how humans do things. Its actually a pretty effective optimization algorithm.
I'd bet that some sort of exponentiate the learning rate until shit goes haywire then rollback the weights is actually probably a fairly decent algorithm (something like backtracking line search).
AI (LLM) is useful for coding and I use it to lookup various articles or websites and summarize.
Use it where it works.. ignore the agents hype and other bullshit peddled by 19yo dropouts.
Unlike the 19yo dropouts of the 2010s these guys have brain rot and I don’t trust them after having talked to such people at start up events and getting their black pill takes. They have products that don’t work and lie about numbers.
I’ll trust people like Karpathy and others who are genuinely smart af and not kumon products.
On this website on another thread there is a principal software engineer at Microsoft who wrote an essay on how agent systems are amplifying all of the employees productivity massively even on large complex tasks.
“Principal” at Microsoft (or Oracle) is “Senior” or “Staff” everywhere else just fyi
Now the question is whether that’s true (and thus should be objectively measurable) or if he is bullshitting because Microsoft invested so much money in it it just has to work.
People roll out a complex and powerful technology without understanding the technology fully, what evals are or updating process to account for the tech, and the rollout fails, news at 11.
Seriously though, "AI fucks up" is a known thing (as is humans fuck up!) and the people who are using the tech successfully account for that and build guardrails into their systems. Use version control, build automated tests (e2e/stress, not just unit), update your process so you're not incentivizing dumb shit like employees dumping unchecked AI prs, etc.
If the tech only worked for coding it would be one thing. But it’s advertised as a cure for anything and everything and so people are using it for that. And you can’t build automated tests for that.
I am a big AI booster but I agree that using agents for tasks unsupervised without either rigorous oversight or strong automated constraints is a mistake.
Imagine comparing a human fuck up to an AI one. Lol.