What LLMs are and what they are not
category: general [glöplog]
Lot of us talk about LLMs (AI) here, but we don't seem to have a good definition. My long lasting conversation with Tomasz is what prompted me to open this subject. I would like this thread kept "academic" in a sense that it shouldn't delve deep into technicalities.
I'm looking for more insight. Here's where I stand: When Tomasz says: "AI model is a procedure", I think: "No it's not". I'll proceed with my understanding of the matter, showing first what I think LLMs are not. Chime in folks, correct me if needed, so We can better understand each other.
AI model is not a procedural machine. It's not meant to execute step by step procedures.
"Procedural" can also be described as "algorithmic". AI model is NOT an algorithm. AI model is an output of an algorithm that has been applied to a dataset.
I'll try and give an example: You can train AI model (let's go with text to image for example's sake) on a bunch of pictures of circles, properly tagged. Then, when you put "circle" text in a prompt, it will give you a picture of a circle. See, but it's still oblivious to any (mathematical, geometrical) formula that describes a circle. When you enter another "circle" in the prompt, it goes giving you another one back from zero.
Algorithms and procedures are like blueprints for building. AI LLM builds nothing. It doesn't even reverse engineer anything. All it does is mimicking. Trying to emulate the surface look of reality with a lossy codec. That's why it needs all that power / resources. Look how many circles it has to be trained on to mimic one imperfect circle itself via prompt. While ZX Spectrum can render a perfect circle with no training at all and pretty quickly via (slow) BASIC. With one millionth of CPU power / resources. And then another circle. And another one, slightly bigger or smaller. Because it follows procedure, a blueprint, a step by step formula. That's the definition of "procedural" ain't it?
I'm looking for more insight. Here's where I stand: When Tomasz says: "AI model is a procedure", I think: "No it's not". I'll proceed with my understanding of the matter, showing first what I think LLMs are not. Chime in folks, correct me if needed, so We can better understand each other.
AI model is not a procedural machine. It's not meant to execute step by step procedures.
"Procedural" can also be described as "algorithmic". AI model is NOT an algorithm. AI model is an output of an algorithm that has been applied to a dataset.
I'll try and give an example: You can train AI model (let's go with text to image for example's sake) on a bunch of pictures of circles, properly tagged. Then, when you put "circle" text in a prompt, it will give you a picture of a circle. See, but it's still oblivious to any (mathematical, geometrical) formula that describes a circle. When you enter another "circle" in the prompt, it goes giving you another one back from zero.
Algorithms and procedures are like blueprints for building. AI LLM builds nothing. It doesn't even reverse engineer anything. All it does is mimicking. Trying to emulate the surface look of reality with a lossy codec. That's why it needs all that power / resources. Look how many circles it has to be trained on to mimic one imperfect circle itself via prompt. While ZX Spectrum can render a perfect circle with no training at all and pretty quickly via (slow) BASIC. With one millionth of CPU power / resources. And then another circle. And another one, slightly bigger or smaller. Because it follows procedure, a blueprint, a step by step formula. That's the definition of "procedural" ain't it?
hey how are you doing trololo man
Not too shabby trololo king, not too shabby at all.
pfft, yo momma is so fat, she can only hold 65536 files in a directory
If you liken LLMs to lossy codecs AND say LLMs are no algorithms, it follows that lossy codecs aren't either.
Pretty sure you can find a lot of references that say that codecs of any kind are, well, algorithms.
And again, framing challenge. Which point do you want to prove so adamantly?
Pretty sure you can find a lot of references that say that codecs of any kind are, well, algorithms.
And again, framing challenge. Which point do you want to prove so adamantly?
My code also got sucked in by Copilot and supports bad programmers in creating shit software all over the world now.
@Krill:
OK, I blabber too much, forget the comparisons.
So, you tell me, are LLMs algorithms?
OK, I blabber too much, forget the comparisons.
So, you tell me, are LLMs algorithms?
4gentE: sorry for calling you a troll. The more accurate term is actually compulsive posting disorder. It's the urge to compulsively engage in online discussions via heated arguments. It's addictive, that's why hard to stop.
Anyways, LLMs or any ML models are very much algorithms. Some variants can be even Turing complete - search for Neural Turing Machines. You can also use iterative refinements with LLMs if you think procedure=iterations. We can argue over definitions forever, but you can generalize the notion of algorithm or procedure to computable function.
Anyways, LLMs or any ML models are very much algorithms. Some variants can be even Turing complete - search for Neural Turing Machines. You can also use iterative refinements with LLMs if you think procedure=iterations. We can argue over definitions forever, but you can generalize the notion of algorithm or procedure to computable function.
Search https://en.wikipedia.org/wiki/Large_language_model for "algorithm", i.e., "yes".
and there i thought a large language model is all the boring posts in that previous AI thread combined, luckily it continues here!
Quote:
Search https://en.wikipedia.org/wiki/Large_language_model for "algorithm", i.e., "yes".
OK. So AI model is an algorithm. I usually tend to trust IBM on the topic of AI, therefore I was obviously misguided by this quote:
Quote:
Algorithms vs. models
Though the two terms are often used interchangeably in this context, they do not mean quite the same thing.
Algorithms are procedures, often described in mathematical language or pseudocode, to be applied to a dataset to achieve a certain function or purpose.
Models are the output of an algorithm that has been applied to a dataset.
Source: https://www.ibm.com/topics/ai-model
Going by this quote, I was excluding the output from an AI (LLM) model from the "procedural" pool, because a dataset is an integral part, and I was going by logic "procedural" means code based, not data based.
So, if I create a synthetic terrain via algorithms - it's procedurally generated.
And if I use an AI model which was trained on a large volume of realworld terrain scans - it's still/also procedurally generated?
And please understand that I'm not arguing nor trolling, I'm sincerely asking.
It's not exactly the same thing for sure, but both are algorithms or procedures.
You, as a creator, develop a meta-algorithm that via training produces an actual algorithm, so it's not automatic/mindless process.
Moreover, in a sense, hand-picked parameters for procedural generation also comes from the data - you see the natural terrain and "tune" your noise layers to look alike. A naive ML model may just mimick one particular terrain look, but more advanced generative model may also capture the essence of terrain-like qualities. If anything, making generative model like this seems harder than hard-coding only one representation of the terrain.
You, as a creator, develop a meta-algorithm that via training produces an actual algorithm, so it's not automatic/mindless process.
Moreover, in a sense, hand-picked parameters for procedural generation also comes from the data - you see the natural terrain and "tune" your noise layers to look alike. A naive ML model may just mimick one particular terrain look, but more advanced generative model may also capture the essence of terrain-like qualities. If anything, making generative model like this seems harder than hard-coding only one representation of the terrain.
I apologize to everyone who was bored to death.
@tomkh:
Interesting.
Are LLMs then completely deterministic in a sense that if for example DALL-E gives out a picture, this result picture has a unique "seed" and can be identically reproduced by calling on this "seed"? What seems as randomness in LLM output is confusing. Is this randomness due to noise?
@tomkh:
Interesting.
Are LLMs then completely deterministic in a sense that if for example DALL-E gives out a picture, this result picture has a unique "seed" and can be identically reproduced by calling on this "seed"? What seems as randomness in LLM output is confusing. Is this randomness due to noise?
Some models (like LLMs) return multiple possible outputs (e.g. tokens) with associated probability. If you take most-probable output, it's fully deterministic, but you can also sample from a few top most-probable outputs to give you non-determinism (it's usually controlled by temperature setting). Dffusion models are different, they are literally de-noisers, so you just start with a noise. Again, it's up to you, if you provide the same noise/seed at input over and over, so the result is deterministic or you just randomize it each time.
get a room you two!
This is like watching two ChatGPTs argue indefinitely
another thread that could've been an email!