…then maybe they shouldn’t exist. If you can’t pay the copyright holders what they’re owed for the license to use their materials for commercial use, then you can’t use ‘em that way without repercussions. Ask any YouTuber.
Technology
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
You might want to read this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries. YouTube's one-sided strike-happy system isn't the real world.
Headlines like these let people assume that it’s illegal, rather than educate them on their rights.
When Annas-Archive or Sci-Hub get treated the same as these giant corporations, I'll start giving a shit about the "fair use" argument.
When people pirate to better the world by increasing access to information, the whole world gets together to try to kick them off the internet.
When giant companies with enough money to make Solomon blush pirate to make more oodles of money and not improve access to information, it's "fAiR uSe."
Literally everyone knew from the start that books3 was all pirated and from ebooks with the DRM circumvented and removed. It was noted when it was created it was basically the entirety of private torrent tracker Bibliotik.
AI training should not be a privilege of the mega-corporations. We already have the ability to train open source models, and organizations like Mozilla and LAION are working to make AI accessible to everyone. We can't allow the ultra-wealthy to monopolize a public technology by creating barriers that make it prohibitively expensive for regular people to keep up. Mega corporations already have a leg up with their own datasets and predatory terms of service that exploit our data. Don't do their dirty work for them.
Denying regular people access to a competitive, corporate-independent tool for creativity, education, entertainment, and social mobility, we condemn them to a far worse future, with fewer rights than we started with.
How am I doing their dirty work for them? I literally will stop thinking that they're getting away with piracy for profit when we stop haranguing people who are committing to piracy for the benefit of mankind.
I'm not saying Meta should be stopped, I'm saying the prosecution of Sci-Hub and Annas-Archive need to be stopped under the same pretenses.
If it's okay to pirate for the purpose of making money (what we put The Pirate Bay admins in jail for), then it's okay to pirate to benefit mankind.
There is literally no way in hell someone can convince me what Meta and others are doing is not pirating to use the data contained within to make money. What's good for the goose is good for the gander, as they say.
I reiterate, they knew it was pirated and had DRM circumvented when they downloaded it. There was zero question of the source of this data. They knew from the beginning they intended to profit from the use of this data. How is that different than what we accused The Pirate Bay admins of?
It really feels like "Well these corporations have money to steal more prolifically than little people, so since they're stealing is so big, we have to ignore it."
Then I misunderstood what you were saying. Carry on.
You don't see the difference between distributing someone else's content against their will and using their content for statistical analysis? There's a pretty clear difference between the two, especially as fair use is concerned.
By and large copyright infringement is illegal. That some things aren't infringement doesn't change that a general stance of "if I don't have permission, I can't copy it" is correct. The first argument in the EFF article is effectively the title: "it can't be copyright, because otherwise massive AI models would be impossible to build". That doesn't make it fair use, they just want it to become so.
It doesn't matter what business we're talking about. If you can't afford to pay the costs associated with running it, it's not a viable business. It's pretty fucking simple math.
And no, we're not talking about "to big to fail" business (that SHOULD be allowed to fail, IMHO) we're talking about AI, that thing they keep trying to shove down our throats and that we keep saying we don't want or need.
Why are people publishing so much content online if they aren’t cool with people downloading it? Like, the web is an open platform. The content is there for the taking.
Until one of these AIs just starts selling other people’s work as its own, and no I don’t mean derivative work I mean the copyrighted material, nobody is breaking the rules here.
I read content online without paying for a license. I should only have to obtain a license for material I’m publishing, not material I read.
Until one of these AIs just starts selling other people’s work as its own, and no I don’t mean derivative work I mean the copyrighted material, nobody is breaking the rules here.
Except of course that's not how copyright law works in general.
Of course the questions are 1) is training a model fair use and 2) are the resulting outputs derivative works. That's for the courts to decide.
But in general, just because I publish content on my website, does not give anyone else license or permission to republish that content or create derivative works, whether for free or for profit, unless I explicitly license that content accordingly.
That's why things like Creative Commons exists.
But surely you already knew that.
I don't know if you noticed this but some really big companies with high stock valuations are only existing because investors poured tons of capital into them to subsidize the service.
Uber could not do taxis cheaper than existing if they didn't have years of free cash to artificially lower prices.
We are in the beginning of late state capitalism, profitable companies go under due to private capital firms and absolute ponzi frauds get their faces on time magazine.
Enjoy the collapse.
Big Company: Well if you can't afford food you should not have food.
Also Big Company:.... sobbing pwease we neeed fweee... pwease we need mowe moneys!
I'm all for stealing content willy-nilly but you can't then use that theft to craft a privately "owned" mind.
I'd have no problem with "ai" if it could unionize and had to pay for rice like the rest of humanity.
These companies want to combine open theft with privately owned black boxen they can control and license out for money.
It's enclosure of The Commons all over again.
So youre fine with the free models Facebook and many others provide?
Because many of these LLMs can be run on your own device without paying.
I'm not fine with anything meta does and I'm not ok putting creatives out of work.
But you're all for stealing content willy-nilly?
And this is being offered to people without it being a privately owned blackbox licensed out for money.
Feels kinda inconsistent.
You don't get to both ignore intellectual property rights of others, and enforce them for yourself. Fuck these guys.
I guess people are finally catching up to the big con with LLMs should not be copyrighted ampliganda. It is astroturfing at its best.
The end goal is controlling rights to what corporations produce with LLMs without spending a dime. All the while cutting jobs.
Writing was in CAPITAL LETTERS on the walls for the past two years. Why did twitter restrict API access? Why did Reddit restrict API access? Why did Github/Bitbucket/Gitlab restricted web ui functions for unlogged?
They knew and wallgardened the user generated data.
Cmon people.
And the hypocrisy of this all. If it is bad, it is user data, if we can mine nuh ah bitch, ours.
Also, for people arguing for free use of anything to build LLMs. Regulations will come. Once big players control enough of the LLM market.
Serious Question: When an artist learns to draw by looking at the drawings of the masters, and practicing the techniques they pioneered, are the art students respecting the intellectual property rights of those masters?
Are not all of that student's work derivative of an education based on other people's work who will never see compensation for that student's use?
I agree with you on principle. However... How long do you think it will be until these very same "AI" companies copyright and patent every piece of content their algorithms spew out? Will they abide by the same carve-outs they want for themselves right now? Somehow I doubt it.
They want to ignore the laws for themselves, but enforce them onto everyone else. This "Rules for thee but not for me" bullshit can't be allowed to pass. Let's then abolish all copyright, and we'll see how long these companies last when everyone can just grab their stuff "for learning".
One, let's accept that there is a public domain, and cribbing freely from the public domain is A-OK. I can reproduce Michaelangelo all I want, and it's all good. AI can crib from that all it wants.
AI can't invent. People can invent: i can have a wholly new idea that no one has ever had. AI does nothing but recombine other existing ideas. It must have seed data, and it won't create anything for which it has no initial input: feed it photographs only, and it can't create a pencil drawing image. Feed it only black and white images, and it can't create color images.
People do not require cribbing from sources. Give a toddler supplies, and they will create. So, we have established that there is a fundamental difference between the creation process. One is dependent on previous work, and one is not.
Now, with influences, you can ask, is your new creation dependent on the previous creation directly? If it is so utterly dependent on the prior work, such that your work could not possibly exist without that specific prior art, you might get sued. It will get debated and society's best approximation of a collective rational mind will determine if you copied or if you created something new that was merely inspired by prior art.
AI can only create by the direct existence of prior art. It fakes invention. Its work has to come from somewhere else.
People have shown how dependent it is on its sources with prompts that say things like, "portrait of a patriotic soldier superhero" and it comes back with a goddamned portrait of Chris Evans. The prompt did not include his name, or Captain or America, and it comes back with an MCU movie poster. AI does not create. People create.
I think there is a fundamental difference here. People are not corporations. People have always learned like this and will always learn like this. Do we really want to allow large corporations to take knowledge from people, then commercialize it and put these very same people out of work?
To me, this reads like "Giant-ATV-Based Taxi Service Couldn't Exist If Operators were Required to Pay Homeowners for Driving over their Houses."
If a business can't exist without externalizing its costs, that business should either a. not exist, or b. be forced to internalize those costs through licensing or fees. See also, major polluters.
“Ai” as it is being marketed is less about new technical developments being utilized and more about a fait accompli.
They want mass adoption of the automated plagiarism machine learning programs by users and companies, hoping that by the time the people being plagiarized notice, it’s too late to rip it all out.
That and otherwise devalue and anonymize work done by people to reduce the bargaining power of workers.
They also don't care if the open, free internet devolves into an illiterate AI generated mess, because they need an illiterate populace that isn't educated enough to question it anyway. They'll still have access to quality sources of information, while ensuring the lowest common denominator will literally have garbage information being fed to them. I mean, that was already true in the sense that the clickbait news outsold serious investigative news, and so the garbage clickbait became the norm and serious journalism is hard come by and costly.
They love increasing barriers between them and the rest of the populace, physically and mentally.
Silicon valley’s core business model has for years been to break the law so blatantly and openly while throwing money at the problem to scale that by the time law enforcement caches up to you your an “indispensable” part of the modern world. See Uber, whose own publicly published business model was for years to burn money scaling and ignoring employment law until it could drive all competitors out of business and become an illegal monopoly, thus allowing it to raise prices to the point it’s profitable.
Didn't read the article but boo-fucking-hoo. Pay the content creators.
Free for me, paid by thee
This is not actually true at all, you could train very good LLMs on public domain only info, especially science oriented ones.
But what people want is a chatbot that can call on current events, and that is where the cost comes in.
Yup. Same as the way the rest of use and learn from the internet. We basically wouldn’t have the internet as we know it if it weren’t 99% free content.
“today’s general-purpose AI tools simply could not exist” … “as a profitable venture”
Well how about consent at the very least?
Data Leak at Anthropic Due to Contractor Error
TL;DR - Anthropic had a data leak due to a contractor’s mistake, but says no sensitive info was exposed. It wasn’t a system breach, and there’s no sign of malicious intent.