this post was submitted on 29 Jan 2024
92 points (100.0% liked)
Technology
37735 readers
43 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The thing is, i’m not sure at all that it’s even physically possible for an LLM be trained like a four year old, they learn in fundamentally different ways. Even very young children quickly learn by associating words with concepts and objects, not by forming a statistical model of how often x mingingless string of characters comes after every other meaningless string of charecters.
Similarly when it comes to image classifiers, a child can often associate a word to concept or object after a single example, and not need to be shown hundreds of thousands of examples until they can create a wide variety of pixel value mappings based on statistical association.
Moreover, a very large amount of the “progress” we’ve seen in the last few years has only come by simplifying the transformers and useing ever larger datasets. For instance, GPT 4 is a big improvement on 3, but about the only major difference between the two models is that they threw near the entire text internet at 4 as compared to three’s smaller dataset.
My point is that the current approach - statistical association - is so crude that it'll probably get ditched in the near future anyway, with or without licencing matters. And that those better models (that won't be LLMs or diffusion-based) will probably skip this issue altogether.
The comparison with 4yos is there mostly to highlight how crude it is. I don't think either that it's viable to "train" models in the same way as we'd train a human being.