this post was submitted on 21 Jun 2023

85 points (100.0% liked)

Technology

37724 readers

48 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

TheRtRevKaiser@kbin.social

Lemmy Federation Architecture Change Proposal (github.com)

submitted 1 year ago by xtremeownage@lemmyonline.com to c/technology

74 comments fedilink hide all child comments

https://github.com/LemmyNet/lemmy/issues/3245

I posted far more details on the issue then I am putting here-

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

In the purposed hub-spoke model, We can reduce that by over 99%, so that each post/vote/comment/etc, only has to be sent 10,000 times (plus n*(n-1)/2 times, where n = number of hub servers).

The current full mesh architecture will not scale. I predict, exponential growth will continue to occur.

Let's work on a solution to this problem together.

you are viewing a single comment's thread
view the rest of the comments

[–] Fauxreigner 7 points 1 year ago (2 children)

Federation isn't working well, but it's not working well because the big instances aren't able to keep up with all of the inbound/outbound messages, and if a message fails, that's it. Right now there's no automated way to resync and catch up on missed activity.

[–] xtremeownage@lemmyonline.com 1 points 1 year ago* (last edited 1 year ago) (1 children)

So- what if, we can delegate a proxy/hub server, for managing all of the inbound/outbound messages, to offload that from the main instance server.

ie, main instance sends/receives its messages through the proxy/hub server, the proxy/hub server then follows a pub/sub topology for sending and receiving.

(Don't imagine a centralized hub server, but, just imagine a localized proxy/hub server for your particular instance. Lets also assume, its designed where you can support multiple hub/proxy servers, in the event one gets overloaded)

[–] Fauxreigner 2 points 1 year ago (1 children)

That doesn't do anything to fix the problem. If a server can only handle 5k updates per minute (a completely made up number), it doesn't matter if those 5k updates come from one server or a thousand. In theory you could cut down on outbound messages a bit if you could tell a "hub server" that post #123456 got another upvote, so please tell instances A, B, C, D, and E. But the total number of messages would increase, so even if the hub instance can handle more updates, it may eventually hit capacity again.

The core of the problem is that if an instance doesn't process an update (inbound or outbound), it doesn't ever retry, the instances are just out of sync for that post forever.

[–] xtremeownage@lemmyonline.com 1 points 1 year ago (1 children)

The core of the problem is that if an instance doesn’t process an update (inbound or outbound), it doesn’t ever retry, the instances are just out of sync for that post forever.

With the pub/sub method- that should be able to be minimized.

At least, with my experience of messing with rabbitmq- A message stays in the queue, until I have told rabbitMQ, Hey, I have processed this message.

If I accept a message, an encounter an exception mid-way through, that message returns back to the queue, until It has been processed, or dead-letter logic handles it.

Granted, there is a hard-coded timeout somewhere in lemmy, where, older messages cannot be processed. That would need to be adjusted.

[–] Fauxreigner 2 points 1 year ago (1 children)

If you ensure that all messages are queued until processed, with retries on failure, what's the point of the hub model? As pointed out elsewhere, the large instances would be acting as hubs already.

[–] xtremeownage@lemmyonline.com 1 points 1 year ago (1 children)

Just removing that load from the main instance server, allowing it to just handle serving its local user-base.

In short- splitting the load into multiple components, rather than everything being handled by just the single instance server.

[–] Fauxreigner 1 points 1 year ago (1 children)

I'm just not seeing a benefit here, I think this is a solution to the wrong problem. Your proposal in theory cuts outbound updates from the big hubs, but in reality they're only updating a subset of other instances for any given update, and it doesn't do anything to help with inbound updates. And to do that, you have to solve a pretty tricky problem.

If my instance gets an update from Beehaw, I can validate that they're allowed to do so, because Beehaw has a TLS certificate that says "Yep, this is actually Beehaw." If you introduce a hub system, I need some way to determine that the hub system that's telling me "Beehaw has an update for you" is allowed to send updates on behalf of Beehaw.

[–] xtremeownage@lemmyonline.com 2 points 1 year ago

To clarify-

After feedback/comments, I have modified the idea- this would be a optional local proxy/hub/delegation server/service, hosted by the instance owners.

https://github.com/LemmyNet/lemmy/issues/3245#issuecomment-1601585922

Ie- you can optionally scale your federation updates, independent of your main application server.

[–] cyd@vlemmy.net 1 points 1 year ago

How was syncing done in Usenet? It has a very similar decentralized model, and I don't recall there being problems of data loss due to desyncing between servers.