this post was submitted on 12 Jun 2023
13 points (100.0% liked)

Experienced Devs

125 readers
1 users here now

A community for discussion amongst professional software developers.

Posts should be relevant to those well into their careers.

For those looking to break into the industry, are hustling for their first job, or have just started their career and are looking for advice, check out:

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.world/post/76533

One of the arguments made for Reddit's API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven't seen a whole lot of discussion around this and would like to hear people's opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

you are viewing a single comment's thread
view the rest of the comments
[–] framboos@programming.dev 12 points 1 year ago

Reddit provides a platform where regular users create the data. Moderators add value by ensuring the quality. Without any of these parties, there is no valuable data. Of course there is a cost in running the platform, but Reddit should avoid as much as possible charging users and especially moderators for using the platform.

Then there are search engines and 3rd party apps. They also add value. Search engines use the data, and in return they attract new contributors. 3rd party apps also attract regular users, and by providing a better experience make sure that the regular users stay active for longer. They should not be charged more than is required to keep the platform running and is reasonable with respect to their profits.

LLM trainers do not fit in this picture. They use large amounts of data, but do not provide anything in return that is valuable to the users, moderators or platform. Therefore, I absolutely support charging them more for accessing training data.

Users of the platform who provide value in return should not have to pay more than is reasonable and required than to keep the platform running. LLM trainers do not provide value in return, and I support charging them more. It is unreasonable to not differentiate between 3rd party app developers and LLM trainers.