this post was submitted on 13 Aug 2023
228 points (100.0% liked)
Technology
37723 readers
61 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
For the last time: these language models are just regurgitating what people have said. They don't analyze or reason.
That's not entirely true.
LLMs are trained to predict next word given context, yes. But in order to do that, they develop internal model that minimizes error across wide range of contexts - and emergent feature of this process is that the model DOES perform more than pure compression of the training data.
For example, GPT-3 is able to calculate addition and subtraction problems that didn't appear in the training dataset. This would suggest that the model learned how to perform addition and subtraction, likely because it was easier or more efficient than storing all of the examples from the training data separately.
This is a simple to measure example, but it's enough to suggests that LLMs are able to extrapolate from the training data and perform more than just stitch relevant parts of the dataset together.
That's interesting, I'd be curious to read more about that. Do you have any links to get started with? Searching this type of stuff on Google yields less than ideal results.
In my comment I've been referencing https://arxiv.org/pdf/2005.14165.pdf, specifically section 3.9.1 where they summarize results of the arithmetic tasks.
Check out this one: https://thegradient.pub/othello/
In it, researchers built a custom LLM trained to play a board game just by predicting the next move in a series of moves, with no input at all about the game state. They found evidence of an internal representation of the current game state, although the model had never been told what that game state looks like.
isn't gpt famously bad at math problems?
GPT3 is pretty bad at it compared to alternatives (although it's hard to compete with calculators on that field), but if it was just repeating after the training dataset it would be way worse. From the study I've linked in my other comment (https://arxiv.org/pdf/2005.14165.pdf):
I know. I just thought it was a bit ironic seeing such a strongly worded response from it.
Exactly. They’re great bullshitting machines, that’s it.
Same as humans.
LLMs do replicate a small subset of human cognition, but not the full scope. This can result in human-like behavior, but it’s important to be aware of the limitations.
The biggest limitation is the misalignment in goals. LLMs won’t perform a very deep analysis of their input because they don’t need to. Their goal isn’t honest discussion, a pursuit for truth, or even having a coherent set of beliefs about the world. Their only goal is to sound plausible. And, as it turns out, it’s not too hard to just bullshit your way through the Turing test.
Could you share your source?
Large language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That's how they're defined.
Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762
Hey, that comment's a bit off the mark. Transformers don't just memorize chunks of text, they're way more sophisticated than that. They use attention mechanisms to figure out what parts of the text are important and how they relate to each other. It's not about memorizing, it's about understanding patterns and relationships. The paper you linked doesn't say anything about these models just regurgitating information.
I believe your "They use attention mechanisms to figure out which parts of the text are important" is just a restatement of my "break it into contextual chunks", no?
As far as I understand it, such a model is more like a program than a database. How do you see it?