What LLMs are and what they are not

category: general [glöplog]
Lot of us talk about LLMs (AI) here, but we don't seem to have a good definition. My long lasting conversation with Tomasz is what prompted me to open this subject. I would like this thread kept "academic" in a sense that it shouldn't delve deep into technicalities.

I'm looking for more insight. Here's where I stand: When Tomasz says: "AI model is a procedure", I think: "No it's not". I'll proceed with my understanding of the matter, showing first what I think LLMs are not. Chime in folks, correct me if needed, so We can better understand each other.

AI model is not a procedural machine. It's not meant to execute step by step procedures.
"Procedural" can also be described as "algorithmic". AI model is NOT an algorithm. AI model is an output of an algorithm that has been applied to a dataset.

I'll try and give an example: You can train AI model (let's go with text to image for example's sake) on a bunch of pictures of circles, properly tagged. Then, when you put "circle" text in a prompt, it will give you a picture of a circle. See, but it's still oblivious to any (mathematical, geometrical) formula that describes a circle. When you enter another "circle" in the prompt, it goes giving you another one back from zero.

Algorithms and procedures are like blueprints for building. AI LLM builds nothing. It doesn't even reverse engineer anything. All it does is mimicking. Trying to emulate the surface look of reality with a lossy codec. That's why it needs all that power / resources. Look how many circles it has to be trained on to mimic one imperfect circle itself via prompt. While ZX Spectrum can render a perfect circle with no training at all and pretty quickly via (slow) BASIC. With one millionth of CPU power / resources. And then another circle. And another one, slightly bigger or smaller. Because it follows procedure, a blueprint, a step by step formula. That's the definition of "procedural" ain't it?
added on the 2024-05-15 09:13:12 by 4gentE 4gentE
hey how are you doing trololo man
added on the 2024-05-15 09:19:15 by havoc havoc
Not too shabby trololo king, not too shabby at all.
added on the 2024-05-15 09:20:03 by 4gentE 4gentE
pfft, yo momma is so fat, she can only hold 65536 files in a directory
added on the 2024-05-15 09:23:30 by havoc havoc
If you liken LLMs to lossy codecs AND say LLMs are no algorithms, it follows that lossy codecs aren't either.

Pretty sure you can find a lot of references that say that codecs of any kind are, well, algorithms.

And again, framing challenge. Which point do you want to prove so adamantly?
added on the 2024-05-15 10:25:18 by Krill Krill
My code also got sucked in by Copilot and supports bad programmers in creating shit software all over the world now.
added on the 2024-05-15 10:28:17 by NR4 NR4

OK, I blabber too much, forget the comparisons.
So, you tell me, are LLMs algorithms?
added on the 2024-05-15 10:45:34 by 4gentE 4gentE
4gentE: sorry for calling you a troll. The more accurate term is actually compulsive posting disorder. It's the urge to compulsively engage in online discussions via heated arguments. It's addictive, that's why hard to stop.

Anyways, LLMs or any ML models are very much algorithms. Some variants can be even Turing complete - search for Neural Turing Machines. You can also use iterative refinements with LLMs if you think procedure=iterations. We can argue over definitions forever, but you can generalize the notion of algorithm or procedure to computable function.
added on the 2024-05-15 11:19:05 by tomkh tomkh
Search https://en.wikipedia.org/wiki/Large_language_model for "algorithm", i.e., "yes".
added on the 2024-05-15 11:20:44 by Krill Krill
and there i thought a large language model is all the boring posts in that previous AI thread combined, luckily it continues here!
Search https://en.wikipedia.org/wiki/Large_language_model for "algorithm", i.e., "yes".

OK. So AI model is an algorithm. I usually tend to trust IBM on the topic of AI, therefore I was obviously misguided by this quote:

Algorithms vs. models
Though the two terms are often used interchangeably in this context, they do not mean quite the same thing.
Algorithms are procedures, often described in mathematical language or pseudocode, to be applied to a dataset to achieve a certain function or purpose.
Models are the output of an algorithm that has been applied to a dataset.

Source: https://www.ibm.com/topics/ai-model

Going by this quote, I was excluding the output from an AI (LLM) model from the "procedural" pool, because a dataset is an integral part, and I was going by logic "procedural" means code based, not data based.

So, if I create a synthetic terrain via algorithms - it's procedurally generated.
And if I use an AI model which was trained on a large volume of realworld terrain scans - it's still/also procedurally generated?
And please understand that I'm not arguing nor trolling, I'm sincerely asking.
added on the 2024-05-15 13:18:06 by 4gentE 4gentE
It's not exactly the same thing for sure, but both are algorithms or procedures.

You, as a creator, develop a meta-algorithm that via training produces an actual algorithm, so it's not automatic/mindless process.

Moreover, in a sense, hand-picked parameters for procedural generation also comes from the data - you see the natural terrain and "tune" your noise layers to look alike. A naive ML model may just mimick one particular terrain look, but more advanced generative model may also capture the essence of terrain-like qualities. If anything, making generative model like this seems harder than hard-coding only one representation of the terrain.
added on the 2024-05-15 14:02:51 by tomkh tomkh
I apologize to everyone who was bored to death.

Are LLMs then completely deterministic in a sense that if for example DALL-E gives out a picture, this result picture has a unique "seed" and can be identically reproduced by calling on this "seed"? What seems as randomness in LLM output is confusing. Is this randomness due to noise?
added on the 2024-05-15 14:33:54 by 4gentE 4gentE
Some models (like LLMs) return multiple possible outputs (e.g. tokens) with associated probability. If you take most-probable output, it's fully deterministic, but you can also sample from a few top most-probable outputs to give you non-determinism (it's usually controlled by temperature setting). Dffusion models are different, they are literally de-noisers, so you just start with a noise. Again, it's up to you, if you provide the same noise/seed at input over and over, so the result is deterministic or you just randomize it each time.
added on the 2024-05-15 15:14:59 by tomkh tomkh
get a room you two!
added on the 2024-05-15 18:23:25 by v3nom v3nom
This is like watching two ChatGPTs argue indefinitely
added on the 2024-05-15 19:22:49 by skrebbel skrebbel
another thread that could've been an email!