On Sun, 22 Mar 2015 12:44:26 -0400
Alex Regan <mysqlstud...@gmail.com> wrote:

[...]

> So instead of trying to figure out the proper expiry period, you just 
> start over completely every two weeks?

No, we use a two-week sliding window to construct our Bayes DB.  We don't learn
for two weeks and then dump everything; rather, we take all the mail tokenized
in the last 14 days and build the database from that.

[...]

> That just sounds seriously labor-intensive.

Nope.  It's completely automated.  There's no labor involved at all.  In
fact, we've started doing it twice a day to make our Bayes more reactive.

We have a white-paper on how this all works:
https://www.roaringpenguin.com/files/rptn.pdf

The actual Bayes votes themselves are crowdsourced.

> So there is no differentiation between domains or networks in your
> bayes database?

Nope.

> Is the header and attachments part of the learning, or does bayes
> only consider the body?

Some header and other meta-data is considered (HELO name, geo-located
client location, SMTP client OS info obtained from p0f, etc.)

Regards,

David.

Reply via email to