this post was submitted on 27 Jan 2023
17 points (100.0% liked)
Fediverse
757 readers
2 users here now
A community dedicated to fediverse news and discussion.
Fediverse is a portmanteau of "federation" and "universe".
Getting started on Fediverse;
- What is the fediverse?
- Fediverse Platforms
- How to run your own community
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
With respect to your thoughts: just because the (corporate) internet works this way now, doesn't mean it should. I don't want people scraping my posts. I find it creepy. The fediverse (some parts of it, at least) was, for many people and for a long time, a place they could go to connect with people without needing to argue about the legal definition of consent. The fact that people can technically get away with scraping my posts isn't permission to do so. And, obviously, just turning off your computer isn't an option, because, at least in the global north-west, you need to have an online presence to be involved in society.
Nobody is claiming that the web is a place for healthy relationships with corporations. It isn't. The web is a place corporations constructed to make more money. This is about working together to build something better.
I'm happy that you're comfortable with this model, but I don't want people who operate like this to intrude on the spaces we're building to get away from it. It's just like, a courtesy thing. Will there need to be protocol changes to technologically force people not to do this? Probably. Should there have to be? I really wish I could say there didn't need to be.
The web worked this way before there was a large corporate presence. Scraping was common during the blogosphere period and
robots.txt
was the solution everyone at the time agreed on and that's been the standard ever since.We're not intruding on this space. We've been in the fediverse for just as long or longer; the fediverse has been scrapable since 2008.
Totally. And while it was scrapable, and scraped a lot, I wish there had been a lot more systematic public scraping of the "federated social web" (as it was called before the terrible name "fediverse" was adopted) back then - I had a lot of public conversations on identi.ca and StatusNet which I wish I could still see, but they now exist only in a bunch of private databases I don't have access to. 😢
I think Besse makes a great point here:
I tried to single out the world wide web, as opposed to the internet at large, because the two are not synonymous. It's rather absurd to publicly serve webpages to any querying IP address and maintain that the receiving computer is not to save said pages to disk.
All this to say: I find it difficult to argue that web publications should or could be exempt from aggregation and archival (or scraping, to put it another way). I understand that the ease with which bots do this can be disconcerting, however.
If we stay with the cafe bulletin board, getting a detailed overview of all the postings on the board is akin to scraping the whole thing. If we extend our analogy instead to a somewhat more significant example, library catalogs do the same with books, magazines, and movies.
This is the cost of publishing, be that in print or online. It must be expected that some person has a copy of every- and anything one has ever written or posted publicly, and perhaps even catalogued it. A way around this might be to move away from the web to another part of the internet, like Matrix, as alma suggested.
I assume the non-consensual collection of various (meta-)data is what you refer to when talking about intrusion and money making. Lemmy, like many projects, seeks to offer an alternative to corporate, data-gobbling social media sites, but doesn't eliminate the ability to search through its webpages.