218
this post was submitted on 02 Nov 2023
218 points (100.0% liked)
Technology
37794 readers
40 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Reference is the wrong word.
They learn the patterns that exist in data and are able to predict future patterns.
They don't actually reference the source material during generation (barring over itting which can happen and is roughly akin to a human memorizing something and reproducing it).
Weather or not the copyrighted data shows up in the final model is utterly irrelevant though. It is illegal to use copyrighted material period outside of fair use, and this is most certainly not. This is civil law, not criminal, the standard is more likely than not rather than beyond a reasonable doubt. If a company cannot provide reasonable evidence that they created the model entirely with material they own the rights to use for that purpose, than it is a violation of the law.
Math isn’t a person, doesn’t learn in anything approaching the same method beyond some unrelated terminology, and has none of the legal rights that we afford to people. If it did, than this would be by definition a kidnapping and child abuse case not a copyright case.
Yeah it is. Even assuming fair use applied, fair use is largely a question of how much a work is transformed and (a billion images) -> AI model is just about the most transformative use case out there.
And this assumes this matters when they're literally not copying the original work (barring over fitting). It's a public internet download. The "copy" is made by Facebook or whoever you uploaded the image to.
The model doesn't contain the original artwork or parts of it. Stable diffusion literally has one byte per image of training data.
The number of bytes per image doesn't necessarily mean there's no copying of the original data. There are examples of some images being "compressed" (lossily) by Stable Diffusion; in that case the images were specifically sought out, but I think it does show that overfitting is an issue, even if the model is small enough to ensure it doesn't overfit for every image.
Over fitting is an issue for the images that were overfit. But note in that article that those images mostly appeared many times in the data set.
People who own the rights to one of those images have a valid argument. Everyone else doesn't.