this post was submitted on 02 Jul 2023
110 points (100.0% liked)

Gaming

30564 readers
13 users here now

From video gaming to card games and stuff in between, if it's gaming you can probably discuss it here!

Please Note: Gaming memes are permitted to be posted on Meme Mondays, but will otherwise be removed in an effort to allow other discussions to take place.

See also Gaming's sister community Tabletop Gaming.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

Interesting decision

you are viewing a single comment's thread
view the rest of the comments
[–] Dominic 1 points 1 year ago

I half-agree.

I do think that companies should clarify how they’re training their models and on what datasets. For one thing, this will allow outside researchers to gauge the risks of particular models. (For example, is this AI trained on “the whole Internet,” including unfiltered hate-group safe-havens? Does the training procedure adequately compensate for the bias that the model might learn from those sources?)

However, knowing that a model was trained on copyrighted sources will not enough to prevent the model from reproducing copyrighted material.

There’s no good way to sidestep the issue, either. We have a relatively small amount of data that is (verifiably) public-domain. It’s probably not enough to train a large language model on, and if it is, then it probably won’t be a very useful one in 2023.