On Wed, 2017-11-15 at 14:44 +0000, Sebastian Arcus wrote: > Thank you - that is an interesting idea. Do you use a software to > extract the emails from the Sent archives, or do you add them to the > database on-the-fly, when the sent emails go out through your MTA? > If you have any links or example scripts available I would be very > much interested. > I'm running Postfix as my MTA. Al 'always_bcc=archiveUser' parameter sends a copy of all incoming and outgoing mail to the mail archive. I wrote a Java application that loads everything in the archive user's mailbox into the database. This is a daily cronjob, so it 'just happens'.
The archive has a few more Java applications that do stuff like: - incrementally dump the archive content. The loader can also rebuild the database from these dumps. - retrieve e-mails from the database and send them to a nominated user's e-mail account as attachments to a carrier e-mail - manage the mail archive: remove senders and their emails and/or mark a sender as 'do not archive' and remove all their existing e-mails. > I suppose one side risk is that if the domain of one of your regular > correspondents gets compromised, the spam coming from it will almost > be guaranteed to arrive in the Inbox? > True, but that's a problem with almost any reputation-based whitelisting method. In this case I can just remove the spams and retain the senders other messages just as easily as as I can delete the sender or mark him as 'don't archive'. I initially decided that an archive was A Good Thing to have, simply because retrieving mail from it should be a lot faster than searching through huge mail folders. This turned out to be true in practice: the archive currently holds 183,000 emails and a worst case search takes around 30 seconds to return a list of hits (running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25 as its OS). Then I realised that it could also provide automatic whitelisting of anybody I'd sent mail to, so I added the SA plugin and DB view needed to implement it. I haven't measured its performance, but as it has no noticeable effect on performance, its at least as fast as the average URIBL lookup and adds no network overheads since the database runs on the same hardware as my MTA and SA. Another aim was to avoid, as far as possible, any management and maintenance workload and I hit that target. I have to delete mail senders etc once or twice a year, and all backups etc are handled as part of the routine maintenance of the server as a whole. PostgreSQL needs no regular maintenance or handholding. The only significant maintenance effort has been associated with major PostgreSQL version upgrades: if these change the DB structure, the database has to be reinitialised and reloaded. So far this has only happened every 2-3 years and is no worse than I'd expect with any other RDBMS. Martin