this post was submitted on 13 May 2024
76 points (100.0% liked)
Technology
37793 readers
58 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'd like to share your optimism, but what you suggest leaving us to "deal with" isn't "AI" (which has been present in web search for decades as increasingly clever summarization techniques...) but LLMs, a very specific and especially inscrutable class of AI which has been designed for "sounding convincing", without care for correctness or truthfulness. Effectively, more humans' time will be wasted reading invented or counterfeit stories (with no easy way to tell); first-hand information will be harder to source and acknowledge by being increasingly diluted into the AI-generated noise.
I also haven't seen any practical advantage to using LLM prompts vs. traditional search engines in the general case: you end up typing more, for the sake of "babysitting" the LLM, and get more to read as a result (which is, again, aggravated by the fact that you are now given a single source/one-sided view on the matter, without citation, reference nor reproducible step to this conclusion).
Last but not least, LLMs are an environmental disaster in the making, the computational cost is enormous (in new hardware and electricity), and we are at a point where all companies partaking in this new gold rush are selling us a solution in need of a problem, every one of them having to justify the expenditure (so far, none is making a profit out of it, which is the first step towards offsetting the incurred pollution).
I think that I'd put it in a slightly less-loaded way, and say that an LLM just produces content that has similar properties to its training content.
The problem is real. Frankly, while I think that there are a lot of things that existing LLM systems are surprisingly good at, I am not at all sure that replacing search engines will be it (though I am confident that in the long run, some form of AI system will be).
What you can't do with systems like the ones today is to take one data source and another data source that have conflicting information and then have the LLM-using AI create a "deep understanding" of each and then evaluate which is more-likely truthful in the context of other things that have been accepted as true. Humans do something like that (and the human approach isn't infallible either, though I'd call it a lot more capable).
But that doesn't mean that you can't use heuristics for estimating the accuracy of data and that might be enough to solve a lot of problems. Like, I might decide that 4Chan should maybe have less-weight as a solution, or text that ranks highly on a "sarcastic" sentiment analysis program should have less weight. And I can train the AI to learn such weightings based on human scoring of the text that it generates.
Also, I'm pretty sure that an LLM-based system could attach a "confidence rating" to text it outputs, and that might also solve a lot of issues.
I'm currently wondering what their plans are for updating these LLMs.
Who wants to create the content to feed these machines without a recognition, retribution or a perceived act of 'good'? If I were to maintain a blog with a particular midly but important obscure topic, would I devote the time to have ChatGPT or Copilot make a summary?
Now, the LLMs need to ingest a lot more than 'one blog'... If someome knows, please let me know.
I doubt this crazy effort with such resource consumption is to create a snapshot of what the internet was in the 2020s.