One more thing I forgot to mention. The nginx 500 errors people are getting on multiple Lemmy sites could improve shortly with the release of 0.18 that stops using websockets. Right now Lemmy webapp is passing those through nginx for every web browser client.
BitOneZero
something like Apache Kafka
Not that I see. A database like PostgreSQL can work, but you have to be really careful how new data flows into the database. As writing to the database involves record locking and invalidates the cache for output.
Or changing to something that can be scaled, like cockroach db or neondb?
Taking the bulk data, comments and postings, outside PostgreSQL would help. Especially since what most people are reading on a Reddit-like website is content form the last 48 hours... and your caching potential dies way down as people move on to the newer content.
The comments alone are the primary problem, there are lot of them on each posting and they are bulky data. Also comments are unique data.
I doubt it is anything that level. The problem is the data itself, in the database.
A reddit-like website is like email, every load from the database has unique content. You really have to be very careful when designing for scalability when almost all the data is unique. Especially in modern times where users block other users, and even 2 people loading the same posting do not get the same comments. It's anti-cache, and you have to really work hard to design that to run efficiently on small servers.
As opposed to a website like Amazon where the listing for a toothbrush is not unique on every page load. There aren't new comments and new votes altering the toothbrush listing every time a user refreshes the page. And people aren't switching brands of toothbrush every 24 hours like the front page of Reddit abandons old data and starts with fresh data.
Lemmy is kind of the reason some apps go NoSQL design.
The problems I see with Lemmy performance all point to SQL being poorly optimized. In particular, federation is doing database inserts of new content from other servers - and many servers can be incoming at the same time with their new postings, comments, votes. Priority is not given to interactive webapp/API users.
Using a SQL database for a backend of a website with unique data all over the place is very tricky. You have to really program the app to avoid touching the database and create cached output and incoming queues and such when you can. Reddit (at lest 9 years ago when they open sourced it) is also based on PostgreSQL - and you will see they do not do live SQL inserts into comments like Lemmy does - they queue them using something other than the main database then insert them in batch.
email MTA apps I've seen do the same thing, they queue files to disk before putting into the main database.
I don't think nginx is the problem, the bottleneck is the backend of the backend, PostgreSQL doing all that I/O and record locking.
Front-end developers
There was a posting I saved about some people saying they were going to code on front-end: https://lemmy.ml/post/1199330
Same thing. Another heavy used Lemmy instance has reports of the same problem: https://lemmy.ml/post/1271936
The developers of Lemmy have an open bug on problems with servers communicating to each other: https://github.com/LemmyNet/lemmy/issues/3062
It's a known problem in lemmy-ui webapp - other instances are reporting the same phantom updates. 0.18 is supposed to remove the use of websockets on the webapp - hopefully fixes this and other data issues.
Is this just something where I need to give the server some time for all the data to propagate?
Yes. Propagation only goes forward in time from when the first person subscribes to a remote community. If you are the very first on Beehaw to join/subscribe, then only comments from that time forward will be copied over to Beehaw.
I do not believe there is any means to backfill postings and comments to peer instances.
I tried to sign up 3 times, the captcha is nearly impossible to decode... and each time I just getting spinning button with no response.
it looks like there is no reddit alternative to a reliable subscription feed right now.
Lemmy was not built for scale, and the everything from large-community moderation to federation message copying is going through problem identification and optimization.
The Beehaw.org website is regularly malfunctions for me, showing the Lemmy 0.17.x problem of getting the wrong voting data on postings. Hopefully the forthcoming 0.18 removal of websockets will eliminate a lot of that.
Lemmy, as it stands today, really isn't ready for anything near like the activity of from page /r/all community on Reddit.
Federation is not reliably delivering comments and other Lemmy content between servers. People need to be looking for such problems, so far there isn't any tool to observe or track this problem.
https://github.com/LemmyNet/lemmy/issues/3101