On Sun, 22 Mar 2015 12:44:26 -0400 Alex Regan <mysqlstud...@gmail.com> wrote:
[...] > So instead of trying to figure out the proper expiry period, you just > start over completely every two weeks? No, we use a two-week sliding window to construct our Bayes DB. We don't learn for two weeks and then dump everything; rather, we take all the mail tokenized in the last 14 days and build the database from that. [...] > That just sounds seriously labor-intensive. Nope. It's completely automated. There's no labor involved at all. In fact, we've started doing it twice a day to make our Bayes more reactive. We have a white-paper on how this all works: https://www.roaringpenguin.com/files/rptn.pdf The actual Bayes votes themselves are crowdsourced. > So there is no differentiation between domains or networks in your > bayes database? Nope. > Is the header and attachments part of the learning, or does bayes > only consider the body? Some header and other meta-data is considered (HELO name, geo-located client location, SMTP client OS info obtained from p0f, etc.) Regards, David.