On Wed, 2017-11-15 at 14:44 +0000, Sebastian Arcus wrote:
> Thank you - that is an interesting idea. Do you use a software to 
> extract the emails from the Sent archives, or do you add them to the 
> database on-the-fly, when the sent emails go out through your MTA?
> If you have any links or example scripts available I would be very
> much interested.
> 
I'm running Postfix as my MTA. Al 'always_bcc=archiveUser' parameter
sends a copy of all incoming and outgoing mail to the mail archive.
I wrote a Java application that loads everything in the archive user's
mailbox into the database. This is a daily cronjob, so it 'just
happens'.

The archive has a few more Java applications that do stuff like:
- incrementally dump the archive content. The loader can also rebuild
  the database from these dumps.
- retrieve e-mails from the database and send them to a nominated
  user's e-mail account as attachments to a carrier e-mail
- manage the mail archive: remove senders and their emails and/or mark
  a sender as 'do not archive' and remove all their existing e-mails.

> I suppose one side risk is that if the domain of one of your regular 
> correspondents gets compromised, the spam coming from it will almost
> be guaranteed to arrive in the Inbox?
>
True, but that's a problem with almost any reputation-based
whitelisting method. In this case I can just remove the spams and
retain the senders other messages just as easily as as I can delete the
sender or mark him as 'don't archive'. 

I initially decided that an archive was A Good Thing to have, simply
because retrieving mail from it should be a lot faster than searching
through huge mail folders. This turned out to be true in practice: the
archive currently holds 183,000 emails and a worst case search takes
around 30 seconds to return a list of hits (running on a 3 GHz dual
Athlon system with 4GB RAM and Fedora 25 as its OS). 

Then I realised that it could also provide automatic whitelisting of
anybody I'd sent mail to, so I added the SA plugin and DB view needed
to implement it. I haven't measured its performance, but as it has no
noticeable effect on performance, its at least as fast as the average
URIBL lookup and adds no network overheads since the database runs on
the same hardware as my MTA and SA.

Another aim was to avoid, as far as possible, any management and
maintenance workload and I hit that target. I have to delete mail
senders etc once or twice a year, and all backups etc are handled as
part of the routine maintenance of the server as a whole.  PostgreSQL
needs no regular maintenance or handholding. The only significant
maintenance effort has been associated with major PostgreSQL version
upgrades: if these change the DB structure, the database has to be
reinitialised and reloaded. So far this has only happened every 2-3
years and is no worse than I'd expect with any other RDBMS.

Martin

Reply via email to