Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
Perfect conclusion: my expensive and rather new phone is broken by design, so I just buy an even newer and more expensive one from the same vendor.
The heroic attempt at debugging this though makes me sympathize with all of those engineers that must be doing low-level LLM development these days and getting just noise out of their black boxes.
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I clicked hoping this would be about how old graphing calculators are generally better math companions than phone.
The best way to do math on my phone I know of is the HP Prime emulator.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
I could not parse most of this article. So, it's all vibe coded and you're blaming the silicon?
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
Is it a clickbait? Can’t it do math or run llm?
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
LLMs are applied math, so… both?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
Perfect conclusion: my expensive and rather new phone is broken by design, so I just buy an even newer and more expensive one from the same vendor.
The heroic attempt at debugging this though makes me sympathize with all of those engineers that must be doing low-level LLM development these days and getting just noise out of their black boxes.
This is a vibe coded slop app.