I'd say run a local imap server rather than dealing with the weirdness of storage shares across multiple OS's.
Linux
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
I still have every email I've ever received, going back now more than 20 years. My solution isn't terribly fancy, but it gets the job done.
I have a Synology here at home running a mail server. You don't need a Synology specifically, just a simple mail server with access to a lot of disk space. The server isn't on the Open web or anything and doesn't support SMTP. It's just running IMAP to serve the local mail around the house.
I connect to it from Thunderbird on my various machines. I also use Thunderbird to connect to my actual mail servers to do my day-to-day mail stuff.
Every six months or so, I move old mail messages from my actual mail servers over to the archival one. Generally, I keep the mail on the archival server in folders; one per year, that keeps the loading time to a minimum. For example, come January 1st 2024, I'll be moving mail from January 2023 - June 2023 to the /2023
folder on the archive.
Searching is done via Thunderbird just like you search any mail account, and on my desktop machine, I let Thunderbird keep copies of the mail locally for quick searching. On my laptop though, I ask it to not keep copies to save disk space.
@danielquinn @crank That's pretty cool. How much storage does 20 years of personal emails actually take up?
It's actually not as crazy as you might think:
$ du -sh .Maildir/
13G .Maildir/
That's going back to ~~2000~~ 1995, both sent & received. The first email I have in there is from a friend of mine offering to send me an MP3 she downloaded.
You were downloading and sharing mp3s in 1995?? Didn’t the file extension only come out in 1995?
I think she would have gotten the file via hot... something, this little file sharing network that predates Napster.
Edit: It was probably Hotline, which was launched in '97, so there's probably some corruption to the received email date somewhere. I wasn't exactly tech savvy 25 years ago ;-)
So I think the way I would want to do this is with something like mailpiler (https://www.mailpiler.org/). It’s been on my long list of things to dive into for a while.
Well it is literally exactly what I was asking for. :) But as you allude to the setup is not trivial and would be a bit of a project. It is useful to know about because it could help find a somewhat simpler alternative. And I will add it to my own list in case I find none.
edit:
Led me to polo2ro/imapbox
: Dump imap inbox to a local folder in a regular backupable format: html, json and attachements. Which is a different take on the same problem. I am not sure if I like the email all being converted to html like this. It could be a really nice addition but somehow I feel that keeping more original-formatted emails would be wise too. It does also create for each message "A gziped version of the email in .eml format" alongside the html but I would have to look more into what can be done with that.
Yeah, I started working on it once a couple years ago and getting it spun up was a chore. Life got busy and I never finished.
That imapbox looks pretty interesting. Thanks for tracking that one down.
If you didnt already, see rest of comments on this thread.
I am currently working on this. Finally got the Docker working and am importing my 15GB mbox as we speak! I'll post back here about how it works out.
That’s awesome, I’ll definitely be interested to see how it all works out.
Did it work?
Alas, no! Things seemed to be going well: I got >90k messages imported from my Google Takeout mbox file before the import was interrupted (not mailpiler's fault). At this point, I logged into the "auditor" account and was able to see my emails and search them. But, then I resumed the import. By the end of today, the import was finished (~150k messages total). When I logged in with the auditor account, I got some error "No search results" and nothing I could do about it. This is actually what happened last time I tried mailpiler, too, now that I recall. All seemed fine, but, it seems, the database got corrupted or something along the way... So, now it's useless. I might try it one more time over the next few days. I'll keep y'all posted.
Oh no!
This kind of tool needs to be something you can rely on if it's to be used in the way I am intending. If there is a master copy of the mail (as it sounds like you are working from) it's not as big a deal as you can always go back to that. But if the application is relied upon to be doing its job, possibly in silence for long stretches, it can't just combust.
I am not sure I really like the word "database" in this context. I don't understand them and I can't fix them. Am feeling that maildir, where each email is simply a text file, should be the primary storage. If there is another tool that can index or interact with the maildir then that's handy, but the mail itself should stay in a plain, interoperable filetype. (Unless that is how mailpiler works? I might be mis understanding.)
I also see that mailpiler encrypts everything. I do not love that. My hdd is already encrypted. I do not want things further encrypted because it also means I am unlikely to be be able to fix any problems.
I think this application is too complex for me. I need something that I can easily administer. Hopefully set up and leave it to be for a long time and not have too much to relearn if something needs to be fixed. It is perhaps suitable for a more advanced user/admin.
Yes, I'm coming to similar conclusions myself. To be fair, encryption is a configurable option with Mailpiler. But, yes, it is all digested and stored in a mysql database, which is definitely more opaque than plaintext in the filesystem. I might try the mutt + notmuch solution described by @marty_relaxes@discuss.technics.de below. Sounds like it might be a challenge to set up but would work great forever after. I'll need to figure out how to convert my mbox files to maildir, but Google suggests there are tools for that. Good luck to you, let us know what you ultimately figure out! I've been working on this off-and-on for a few months now without figuring our a good solution!
Edit: I guess, if you want fast full-text search, a database will have to enter the equation somewhere, though.
Honestly i could live without fast. If its a text file there is always grep, ripgrep, silver searcher etc. But there is nothing in my deleted email demanding immediate attention. Any situation i forsee would accommodate waiting hours or days. I was kind of hoping to continue interacting with it in a webmail kind if way because piling up too many new things for something i wont be working on regularly is just asking for a mess.
The mutt/notmuch proposal is a solid solution for the right person. To me, learning like 5 new major tools just for one project is a big risk. I played around with this stuff a couple years ago and failed at creating even a simple setup to do regular mail stuff. It is absolutely not clear.
So i might try one if the intermediate solutions mentioned elsewhere. A solution that digests mail be acceptable as an addon extra.
Well, I've solved it! I now have a web interface (accessible via VPN, although, in principle, I could expose it to the internet) that allows fast, full-text search of all my old emails. Here is the recipe:
- Maildir: I converted all my mbox files to maildir using this python script: https://superuser.com/questions/1169371/how-to-convert-mbox-mail-files-as-found-in-thunderbird-dir-to-maildir#1343019
- Installed notmuch via my distro's repository and set it up (
notmuch setup
¬much new
). This creates a new folder in your maildir directory containing full-text search info. - Installed netviel via
python3 -m pip install netviel
and then ran it viapython3 -m netviel
That's it! This let's you search locally. I actually did a few more steps because I wanted to containerize this thing so I could run it on my NAS. I'd be happy to go into detail about that too, if you're interested. One hiccup was that, for some reason, netviel binds to 127.0.0.1 instead of 0.0.0.0, and there is no way to change that without compiling the project yourself. But, I found a workaround for my Docker container where you can use socat bound to 0.0.0.0 to redirect requests to netviel, so that requests from other computers appear local to netviel.
Anyway, that makes it all sound more complicated than it is. I am super-pleased to have solved this problem at last!
In looking up suggestions made already I found 2 other projects that might be useful. Does anyone have comments about these? I have just looked at them a little bit.
OfflineIMAP
OfflineIMAP is software that downloads your email mailbox(es) as local Maildirs. OfflineIMAP will synchronize both sides via IMAP.
There are a few different overlapping projects by same developer(s). It is a bit messy.
- OfflineIMAP/offlineimap3 - I think this repo is the one with the most active and up to date version of the software
- OfflineIMAP - ArchWiki
- Home · OfflineIMAP/offlineimap Wiki
imapsync
Imapsync is an IMAP transfer tool. The purpose of imapsync is to migrate IMAP accounts or to backup IMAP accounts.
Imapsync is a command-line tool that allows incremental and recursive IMAP transfers from one mailbox to another, both anywhere on the internet or in your local network. Imapsync runs on Windows, Linux, Mac OS X. "Incremental" means you can stop the transfer at any time and restart it later efficiently, without generating duplicates.
- imapsync home page
- Mailbox Imapsync Online
- Imapsync issues and tips about archiving - FAQ.Archiving.txt
- Imapsync similar softwares and external services - if there is an answer it is probably here. Looks pretty comprehensive.
It is the POP3 workflow, not IMAP. Maybe setup your client to use POP3 and remove mails from server after receiving? However I don't recommend Thunderbird, its POP3 support was very buggy when I used it (many years ago). Try Sylpheed or Claws Mail, for example.
Thunderbird has actual funding now, so please test before advising against software
It is not the question of funding. Thunderbird has always had a number of long standing bugs. Speaking about such rare use cases, I don't think someone care about them. Anyway, I recommend using software that I know it worked correctly, not that worked incorrectly and could be fixed but requires further testing.
I want to keep mail on the server at about 80-90% of quota. Because when I am outside of my home, that will continue to be what I have access to. So the local copy will only be as a backup in case I delete something that I later realize I need to refer to. Since most emails are very small individually I should be able to keep the majority of them on the server. I will selectively delete either very large emails, or emails which there are so, so many of like notifications, which I will probably never need to look at.
I have used Sylpheed a bit in the past. I prefer it and a very similar project called Interlink to tbird. I just said tbird because I figured everyone would know it. But also I thought all of those were forks of tbird and wouldn't differ much in how they work. Do they have much different internals?
You are wrong, there are no widely used forks of Thunderbird AFAIK. Thunderbird is based on Mozilla and has a huge codebase that is very hard to maintain. All other popular email clients have totally different code and based on other libraries. They can be similar in how they appear, but not in what bugs they have.
Now that I look, I see I am wrong.
A while ago I was trying out betterbird which actually is a TB fork and I guess I kinda just generalized from that. But looking through a list of linux email clients it is clear that only a couple are related to TB.
Look into isync / mbsync in combination with maildir utils (mu/mu4e) or notmuch.
Thanks I am looking at these. Do you think maildir format is the best to try to work with? When I was researching I find there are other formats such as mbox, or more program-specific formats. I was not having an easy time discerning which is the most portable, robust format.
I havent looked into these other formats because maildir works for me. I can saxe local backups, remoxe mail from the serveg, and even put it back later. All plain text.
If you put your maildir on a (disk attached to a) raspberry pi, install mutt, and make that pi accessible by ssh you always have access to your mail.
Does mutt have search capabilities? Is it optimized such that it would be effective with large mailboxes? Thanks!
Mutt (and neomutt) has very nice search capabilities, supporting regex search within specific mailboxes. However, it is a relatively slow search - unbearably slow for full text search in large mailboxes.
Here, notmuch is usually used to complement mutt. It's a very fast (full-text) mail indexer, which can be directly integrated in mutt and allows much faster searching (among other things such as advanced mail tagging, virtual mailboxes and more).
It is generally a royal pain to set up with so many moving parts but once you do it is a very fast, comfortable mail environment if you're comfy with the terminal.
Thanks for this! I'm going to try to get this set up. It sounds perfect.
What I do is use Claws Mail with POP3, it has an option that allows a message to only be deleted from the server after a configurable period of time. So if you set it for 10 days for example the message will exist both locally on your PC and on the server for 10 days, after which it will only exist on the PC.
It works pretty well in general. The only account giving me some trouble is Yahoo, which I suspect has some quirks, which occasionally cause the messages to be downloaded again and duplicated. Thankfully it's easily fixed because Claws also has a feature to delete duplicates.
This approach is different from IMAP, which would maintain a local offline cache of the live inbox, but you wouldn't be able to only keep local messages — any change in one side would be reflected in both.
However, Claws allows you to do both. You can have both a POP3 and an IMAP account connected to the same live box use the POP3 for offline archival, and the IMAP for when you want to put something back on the server, or if you need to look at other folders on the server besides inbox (POP3 cab only see the inbox, not trash, sent etc.)
Normally I only do folders locally on the PC, on the mailbox connected with POP3, so none of the organization is reflected on the live mailbox, which is inbox only. Every once in a while I connect via IMAP to recover emails from the sent folder, which I've sent with webmail or from mobile (using IMAP on mobile too).
If this doesn't fit your workflow turn there are lots of IMAP syncing tools like you've noticed. IMAPsync is pretty good.
The last step for my workflow would be to self host an IMAP server that will index the POP3 mailbox, and expose it read-only (without SMTP) through a webmail app, for archival and search only. I may have to look at Piler. The quirk here is that the Claws mailbox format is slightly different from IMAP, it's very similar to mbox but not identical, will have to see if any IMAP server will accept it.
Thunderbird is no go unfortunately, its main box format is to keep all messages on one big file instead of individual files, which complicates things a lot.