this post was submitted on 03 Jul 2023
101 points (100.0% liked)

Technology

37735 readers
55 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects.

all 29 comments
sorted by: hot top controversial new old
[–] Powderhorn 54 points 1 year ago (1 children)

People who are alive can have a company steal their entire corpus without recompense, while the descendants of people who died decades ago can get still get paid for content created by their ancestors.

Right.

[–] Peanutbjelly@sopuli.xyz 6 points 1 year ago

But how else could Disney afford to own everyone else's rights and properties? Why not think about the little guy! (Mickey mouse is little, right?)

That being said, I find it weird people are going after training data for llm's after completely ignoring the models built specifically to compete with and take advantage of people's unconscious habits and lifestyles.

AI in general will be very important to comfortably survive the near future as a species. Data is an important part of that.

we absolutely need to do something about the megacorps funneling every new gain as a society into increasing the already absurd wealth divide. The technology is good. The general web scraping isn't bad if the tool is not specifically evil in function. We just need to as a global community demand that the technology be used to benefit everyone equally as it continues to be developed.

[–] alcasa@lemmy.sdf.org 38 points 1 year ago (1 children)

Glad that I can contribute to making the next Google Bard even dumber

[–] Zapp 8 points 1 year ago* (last edited 1 year ago)

Yeah. Now the stupidity I post online has a purpose.

Someday a T-800 will be closing in on a freedom fighter, but will have an intrusive thought interrupt it at a key vulnerable moment. And that intrusive thought will be some random pun we posted to DadJokes. You're welcome, future freedom fighters.

[–] SmallAlmond@lemmy.dbzer0.com 23 points 1 year ago (1 children)

They have probably been doing this for ages

[–] MayonnaiseArch 9 points 1 year ago

Exactly, they don't give a fuck. Counting on being too big for anyone to handle

[–] Rentlar 21 points 1 year ago (1 children)

I, as the proprietor of my comments, condone Google AI scraping my publicly shared content for their own use, on the condition that they condone scraping of their publicly accessible content including YouTube videos. :P

[–] deCorp0@lemmy.dbzer0.com 3 points 1 year ago

Google is going to continue boiling the frog until everyone using gmail, YT, drive, etc… is paying subscriptions for access to these services. It’s going to be interesting to see how much people are willing to pay to hold on to a gmail account they’ve been using for 20 years. I should buy Alphabet stock now.

[–] CreativeTensors 18 points 1 year ago (2 children)

I just kind of assumed that they, as well as anyone in the space was doing that already.

Whether that means that we all collectively have ownership over the outputs of these models if they're trained on content that we produced over the years is another thing. As someone who uses AI tools a fair bit I would be totally fine with generated content being public domain unless a threshold for human intervention is met.

That threshold is where the messy legal work lies.

[–] YuzuDrink 9 points 1 year ago

Would maybe be funny if a law were passed saying that you could only charge people for access to your AI content if you can prove that their own content wasn’t used to help train the AI…

[–] MagicShel@programming.dev 4 points 1 year ago

I agree with this. Human knowledge grows on the shoulders of others, and should collectively belong to all of us.

[–] millie 15 points 1 year ago (2 children)

Crazy that Google feeds on all our data and has for years, but when OpenAI puts the benefit of that data back into the hands of users it catches flack.

[–] Rentlar 8 points 1 year ago

Perhaps we lived in blissful ignorance all this time. Before AI Language Learning models they are today, Google Translate was most of what the data was going to and it was mainly about getting an adequate translation. Now it's being used to answer questions on all different subjects using parts of real people's answers, which could be more frightening to people.

[–] shanghaibebop 3 points 1 year ago* (last edited 1 year ago) (1 children)

I think it's a problem of value capture.

People had no problem posting on reddit and wasting tons of hours helping strangers solve their problems. But now that reddit puts that information behind a paywall, people will have massive issues with that.

Similarly, google scrapped data, but didn't APPEAR (and i can't emphasize that enough) to use that data to deliver value that cannot be shared by the people who created that data. Most of the time your value is aligned so that you give up your "data" to google so that google can either provide you with better traffic through its search engine, or better ads to generate revenue for you.

OpenAI does not benefit the original publisher of that information what so ever.

[–] millie 5 points 1 year ago

I don't know about that. When's the last time you looked something up on Google and the first link was driving traffic to a website rather than scraping one and present it in-engine?

[–] jcarax 13 points 1 year ago

I choose to take this as an admission that they should be paying into a global UBI fund.

[–] AndrewZabar 7 points 1 year ago

Google does what Google wants. Lawsuits are the only remedy to any of their indulgent transgressions. And not everyone can sue.

Years ago I had to have a lawyer file a motion in court in order to get Google to erase private medical documents they had inadvertently gotten access to and then they cached. It’s one thing to index everything and another even if they temporarily have access to restricted data because of a security lapse. But to COPY data as cache is something that should be absolutely illegal.

But as I said, Google does what Google wants.

[–] trekz 5 points 1 year ago* (last edited 1 year ago) (1 children)

Is this even new though? Google has always had a stronghold over any public data on the internet. It's a search engine 😄. It's sole purpose is to scrape and store everything it possibly can on the web.

[–] that_one_guy 5 points 1 year ago

It's not

Previously, Google said the data would be used “for language models,” rather than “AI models,” and where the older policy just mentioned Google Translate, Bard and Cloud AI now make an appearance.

This is mainly just an update to more modern terms, it doesn't really seem like they're adding anything new to their policies.

[–] Lost_Wanderer 4 points 1 year ago

Just feels icky knowing that.