ThatBlokeFromNZ

joined 1 year ago
 

I made a comment about how there's such a wealth of knowledge that was available on Reddit that makes it so useful and whilst the cached pages of Google and Waybackmachine (though I've found it doesn't have a copy of a lot of pages I want to view), I have some fear of these disappearing eventually along with people going back and scrubbing their old comments and posts in an effort to remove their content from Reddit and I suppose devaluing the platform as the information stored is pretty useful.

I came across this dump of Reddit submissions and comments from 2005-2022 for the top 20K subs: https://academictorrents.com/details/c398a571976c78d346c325bd75c47b82edf6124e

It says it's about 1.66TB. I haven't downloaded it to have a look at it because I have no space (lol) but I plan to to hopefully preserve and make use of it. When I have time I might write something to index the data so I can search it for what I need.

Just thought I'd share the dump anyway for anyone with similar concerns.

[–] ThatBlokeFromNZ@lemmy.nz 1 points 1 year ago (4 children)

Unfortunately there is a lot of valuable knowledge and information that's been built up over the life of Reddit that is useful so I'll still use it for the wealth of info on even the most niche of topics it has but I probably won't mindless browse r/all for hours on end like I used to. I'll try not to anyway lol.

[–] ThatBlokeFromNZ@lemmy.nz 4 points 1 year ago (1 children)

But if you didn’t circle it, there’d be no joke so is it actually useless? A paradox? 🤔