this post was submitted on 10 Jun 2023
713 points (100.0% liked)
Lemmy
496 readers
1 users here now
Everything about Lemmy; bugs, gripes, praises, and advocacy.
For discussion about the lemmy.ml instance, go to !meta@lemmy.ml.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I think we really need to address the scaling issue, one option could be to use clichhouse instead of postgres
This gives me MongoDB flashbacks
Thats certainly not the right kida of storage system for a site like this.
I think probably a pluggable storage backend is the best move. For example, any cloud hosted instance could use a native document storage format such as dynamodb, which is often quite cheap or free for small use-cases.
Bit of a pain to store in Dynamo, though. You'd need to write a bunch of different views, I think.
One comment thread makes sense as a partition, but listing threads is going to be awkward, and search is basically a no-no.
Not necessarily a pain, you just have to model the data very differently in something like DynamoDB. Those views are secondary indexes.
Search, though, you're right. You'd be running ElasticSearch along side it and the cost and complexity starts to go up. Or just abandon having a functional search entirely, like Reddit did...
Ja, but you need an index for each thread, some kind of time partitioned thread index for each community, same for all.
Then you need to query all comments or posts by user, so that's another index, then you need some way of querying for hot, or controversial or what have you.
It's doable, but fiddly. Tempted to have a go though!
I just mentioned Dynamo as an idea without thinking about it too much.
Dynamo works well for one and two dimensional data structures but for more complex things you probably want a regular database. I expect it could be done efficiently but not at a good cost and without tons of technical difficulty.
Indeed PostgreSQL is not designed for large scale horizontal sharding with eventual consistency. Also ClickHouse is designed for OLAP workloads likely making it even less suitable.
Regardless of database choice, Lemmy is still centralized. Discussion groups are cached across instances but not truly distributed. This is the big blocker.