Technology

37727 readers

66 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

TheRtRevKaiser@kbin.social

AI models feeding on AI data will lead to 'model collapse', researchers say (web.archive.org)

submitted 1 year ago by 0x815@feddit.de to c/technology

19 comments fedilink hide all child comments

Using model-generated content in training causes irreversible defects, a team of researchers says. "The tails of the original content distribution disappears," writes co-author Ross Anderson from the University of Cambridge in a blog post. "Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions."

Here's is the study: http://web.archive.org/web/20230614184632/https://arxiv.org/abs/2305.17493

you are viewing a single comment's thread
view the rest of the comments

[–] Lowbird 9 points 1 year ago

I think that's a tremendously tall order. The current LLM's are straightforwardly Large Language Models that have zero ability to understand the language and only sort it based on statistical models that can only be gleaned via a vast heap of data. Reducing the size of any data set increases the likelihood of bias and blindspots no matter what you do.

At the least, an LLM cannot talk about anything (like news events, new inventions, new political ideas) until humans have talked about it first AND their talking about it has been put into the dataset. If something's not in the dataset, an LLM simply can't invent it. At absolute best, it'll spit out plausible-sounding bullshit.

Inventing actual, truely intelligent AI is a project very far remove from what we have now. It'd take the invention of entirely different systems, not at all just an iterative improvement of an LLM.