this post was submitted on 18 Jun 2023
332 points (100.0% liked)

Technology

37746 readers
55 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] asjmcguire@kbin.social 15 points 1 year ago (4 children)

Reddit has been going for like a billion years, and you only got 80GB - I mean even zipped, that can't even be a fraction of the data surely?

[–] ddnomad@infosec.pub 19 points 1 year ago* (last edited 1 year ago)

Depends on what kind of data, if it’s mostly internal documents / dumps of whatever communication systems they use etc, it would not be too large (mostly because of retention policies on that software).

If it is actually the data straight from Reddit’s production databases, then 80GB does sound questionable. But then what kind of data are we talking about? Is it actually valuable?

Anyways, this is big (if true).

[–] eighty@lemmy.one 12 points 1 year ago

I'd be surprised if the data was just content. Memes and texts aren't particularly valuable.

However, data that can be used for tracking/developing user profiles such as what they're subscribed to, how active they are, and how they all link to one another is especially useful for conpetetitors and marketers. Plus any personal data such as emails and profiles. I wouldn't be surprised if you managed to get a huge amount of data under 80gb if it's just text (think how big a 80gb excel sheet would be)

[–] eggnog@sopuli.xyz 8 points 1 year ago

internal documents, source code, employee data, and limited data about the company's advertisers.

https://www.bleepingcomputer.com/news/security/blackcat-ransomware-gang-behind-reddit-breach-from-february/

[–] Trebach@kbin.social 6 points 1 year ago

I could get 80 GB of Reddit data in a day. ArchiveTeam has uploaded 2.97 PB (1PB is 1024 TB or 1048576 GB) so far trying to back up all of Reddit to the Internet Archive and they're still not finished!