Apple Study Reveals Critical flaws in AI's Logical Reasoning Abilities 50
Apple's AI research team has uncovered significant weaknesses in the reasoning abilities of large language models, according to a newly published study. MacRumors: The study, published on arXiv [PDf], outlines Apple's evaluation of a range of leading language models, including those from OpenAI, Meta, and other prominent developers, to determine how well these models could handle mathematical reasoning tasks. The findings reveal that even slight changes in the phrasing of questions can cause major discrepancies in model performance that can undermine their reliability in scenarios requiring logical consistency.
Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question -- details that should not affect the mathematical outcome -- can lead to vastly different answers from the models.
Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question -- details that should not affect the mathematical outcome -- can lead to vastly different answers from the models.
Uh - duh? (Score:2, Redundant)
AI does not reason. It predicts word ordering. Reasoning requires knowledge bases with semantic knowledge and analysis. Word ordering just puts jumbles of symbols in order.
Re: (Score:3)
Turns out, a lot. But, it is still fundamentally limited by the whole starting point of building the best auto-complete.
Re: (Score:1, Troll)
"Trained" is itself the wrong terminology. "Training" implies learning, which implies intelligence. LLMs are a giant statistical-probability database with an impressive depth of connection between each individual tokenized node, but nowhere in there does any actual intelligence or reasoning ability exist.
The whole term "artificial intelligence" is the problem. It, and the use of terms like "training," lead people to anthropomorphize what they shouldn't.
Re: (Score:2)
Bit of an old man yelling at clouds here. Programming relies on a lot of metaphors to help us understand the purpose of things.
I do not think semaphores are using little colored flags to control my threads,
which I do not believe to be strings bound on spools to divide my jobs,
which I do not believe to be gainful employment on the part of my code.
And:
Objects are not things I can hold.
Models are not toy planes
Servers don't bring you your food
Links are not part of a chain
Calling functions does not require a p
Re: Uh - duh? (Score:2)
The whole term "artificial intelligence" is the problem
It's a term that never really had any practical meaning other than a program that responds to inputs. In the 80s and 90s, AI was your chess opponent, which basically did fancy heuristics with a static ruleset. It never was intelligent, and still isn't. When most companies describe their product as AI, it's not even LLM, it's just a variation of the ol' chess opponent.
Though I'd have to slightly disagree about your training comment. for LLM, yes, it's not training so much as just adding data points for the d
Re: Uh - duh? (Score:2)
Bleh, correction: Solely NN
Re: (Score:2)
"Training" is an accurate and correct term to use here. Not only is it the common terminology in the machine learning field for decades, but it describes what is happening.
LLM models aren't just databases, they're weighed neural networks that will produce a given result based on a given input. The training is to adjust the weights to properly produce the result. Without the training, the model produces gibberish.
Re: (Score:2)
Reason (Score:1)
There is no reasoning. It's pattern matching based on keywords and weights feeding into Markov chains. Most LLMs also have some inferencing ability hardwired in there by humans, but they don't make those inferences on their own.
Re:Reason (Score:5, Insightful)
The funny thing is... Somehow our ability to reason is an emergent property of weighted connections in a network. Because we don't understand how that happens, we don't know why it isn't happening with the AI we have created, or if it's even possible with the setups we're using. We also don't know if it's impossible for a sufficiently complex version of an existing AI system to do it.
Probably impossible, I suspect there's more than just 'embiggen it and it will happen'.
Re: (Score:2)
Idiocracy bucket problems (Score:2)
If I gave you a 5 gallon bucket and a 2 gallon bucket, how many buckets did I give you?
Re: (Score:2)
Re: (Score:2)
There is no reasoning.
Correct.
It's pattern matching based on keywords and weights feeding into Markov chains.
Incorrect. LLMs are non-Markovian.
Dupe di-dupe di-dupe di-dupe dupe dupe (Score:4, Informative)
And please stop claiming "faults" in "LLM reasoning abilities". LLMs have no reasoning abilities and pattern matching is not a valid substitute.
Re: (Score:1)
Re: (Score:2)
Hmm. I do admit I sometimes forget the low "reasoning ability" level many people operate on.
Re:Dupe di-dupe di-dupe di-dupe dupe dupe (Score:4, Interesting)
Question is, do Slashdot editors have enough reasoning abilities, considering the dupefest here?
Re: (Score:2)
Maybe Apple could do a study revealing the critical flaws in Slashdot editors' "reasoning" abilities?
Re: (Score:2)
Re: (Score:2)
The headline/description is garbage. ... it’s very possible that creativity and what we think of us as human intelligence are just an emergent property of a small number of algorithms operating with a lot of compute power"
but
Apple needs to temper peoples expectations when Sam Altman is writing things like:
"
and
"We decry current machine intelligence as cheap tricks, but perhaps our own intelligence is just the emergent combination of a bunch of cheap tricks."
Even Mira Murati's papers point in th
This needed a study? (Score:2, Offtopic)
I expect we'll see a response from Sam Altman and his ilk within days talking about how reasoning ability is overrated anyway, and the artificial intelligence is superior to supposed "real" intelligence on such a level that we simply aren't equipped to understand the reasoning ability of such a superior creation.
My god, this is stupid. Reasoning ability in LLMs? Just as well say every database in existence has reasoning ability just because you can type a somewhat english looking phrase in (SELECT * fROM $f
Re: (Score:1)
Sorry, but you went there. I couldn't resist.
https://xkcd.com/327/ [xkcd.com]
Non deterministic (Score:1)
Re: (Score:3)
What "non deterministic nature"? And why are "guarantees" of "results" important?
"for example, If the AI sends a 1 in a million mass email that is highly offensive, the AI producer/maintainer probably has language stating they're not liable."
They'll have that anyway. It's a problem of legal accountability, not a characteristic of LLMs that you cannot accurately describe.
Re: (Score:2)
I believe that they are fully deterministic, but generation runs are seeded with random numbers intentionally.
Re: (Score:1)
2+2=4. fully deterministic. Always yields same results given 2+2=? as input.
Vs.
LLM given same user input multiple times yielding different results each time? Non deterministic.
By definition if a random number generator is a key part of your algorithm, it is not deterministic. This should be self evident.
Re: (Score:2)
I'll use image generators as an example because even though it's a different algorithm, they work in a lot of the same ways.
You put in a text prompt and get a different image every time, right? No. You can re-run the same prompt with the same seed and get exactly the same picture out of it. You just have to have control over the model to enable that. So maybe not Bing Image Generator but definitely Stable Diffusion.
It's pseudorandom numbers, so yes - it's deterministic.
Re: (Score:2)
Re: (Score:1)
$x = time();
Print $x
Deterministic?
LLM using time() or rand()... deterministic?
Again? (Score:2)
They did the same a couple of days ago:
https://apple.slashdot.org/sto... [slashdot.org]
Re: (Score:2)
Apple is thorough.
Slashdot editors, not so much.
Re: (Score:2)
"Hey, LLM, has this article been posted already?"
See, AI could improve /. Maybe it's only as smart as a cat but if that cat can spot dupes that's something editors miss.
Humans use cats to hunt mice too. Not because cats are good at anything else but being mean, but they excel at that. Same with LLM pattern matching.
Apple always shits on tech they're way behind on - until they "revolutionize" it and it's the next best thing. Remember when fanbois were worshiping the Lightning Cable?
They'll snap-to on AI
Not able to reason. (Score:2)
"Generative AI" is simply not capable of what we would universally consider reasoning. LLMs and other "reflexive" pattern-matching systems may be a stepping stone on the way to AGI, or, they may be a cul-de-sac, and won't have anything at all to do with AGI, if such a thing ever comes to be.
I really question this. (Score:2)
I mean, take any formal math proof. You have a set of transformations you can make to existing statements, a set of existing statements, and you apply them to get the form you want. All of this is realizable within a neural network, so any output can only be the product of an input plus a transformation.
More Discussion on this from 2 Days Ago (Score:2)
https://apple.slashdot.org/sto... [slashdot.org]
The critical flaw is that... (Score:2)
...they have NO reasoning ability
It's all statistics and clever math
I tried it, they are right (Score:1)
I ask how much is 3+5?
If I change just one character, the '3' to '4', I get a completely different answer.
Novel thought (Score:2)
IQ Test (Score:3)