On 16/11/17 12:16, Martin Gregorie wrote:
On Thu, 2017-11-16 at 09:15 +0000, Sebastian Arcus wrote:
On 15/11/17 18:11, Martin Gregorie wrote:
On Wed, 2017-11-15 at 14:44 +0000, Sebastian Arcus wrote:

</snip>

I initially decided that an archive was A Good Thing to have,
simply because retrieving mail from it should be a lot faster than
searching through huge mail folders. This turned out to be true in
practice: the archive currently holds 183,000 emails and a worst
case search takes around 30 seconds to return a list of hits
(running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25
as its OS).

Thank you for the details. How do you search the archive? With grep
directly on the server?

Using SQL queries.

The two main tables in the database hold e-mail addresses and messages
respectively plus there are many-to-many links between the two that are
implemented with a third table that holds the link type ('To' or
'From') and an additional table containing subject text - this has a
one-to-many relationship with the messages.

The SA plugin just looks at the From header in the message being
checked and, if it finds that address in the database, sees if there
are any 'To' links associated with it. If there are, then the message
gets negative points. As I said, this SQL query is actually run against
a database view that combines the address and link tables. Since the
rows on these tables are small and the tables are indexed on address
and link type, the query is very fast.

If you want to know more about the archive, look here:
http://www.libelle-systems.c3487738.myzen.co.uk/mailarchive/

Ignore the licensing stuff: I initially thought I might be onto a
revenue source, but remarkably few people use mail archives. I should
remove the license management code and open source the archive but so
far haven't got round to doing that.

Thank you for the info. I haven't considered it before, but it makes sense to store large mail archives in SQL databases. I suppose it is one of the few ways to efficiently search such a large volume of data - much faster than searching Maildir or MBOX archives.

I guess one aspect that is less than ideal is the fact that it wouldn't be possible to give archive access to users through their normal mail software interface - such as Thunderbird for example.

Reply via email to