I was writing a message requesting advice on bayes_ignore_header since I
was sure something was wrong when I decided to have a look at spamassassin
-D bayes output... and I was shocked by what I saw !

x-spam-relays-external lists all the hops of the message *including* internal
servers and so x-spam-relays-internal is empty...  I specifically asked to
add the antivirus and other internal MTAs to the internal list... and now I
find the internal server names used to calculate the bayes point...

I really think this is skewing the result.

In the 40 tokens it uses to calculate the score, the internal MTA is
present a couple of times.

I also noticed to my surprise that in the 40 tokens used to calculate the
score, * the address or domain of the sender is not used
* the address of the internal server is used 2 times
* menaningless (to me) since too generic tokens are used several times...
10026 is the port the sending server used, 192.168 is an internal IP
range..)
dbg: bayes: token 'H*r:amavisd-new' => 0.00933830395446512
dbg: bayes: token 'H*r:port' => 0.0100739915629308
dbg: bayes: token 'H*r:10026' => 0.00656298715300288
dbg: bayes: token 'H*r:ESMTPSA' => 0.0291881040543893
dbg: bayes: token 'H*RU:ESMTPSA' => 0.0299783424700051
dbg: bayes: token 'Hx-spam-relays-external:ESMTPSA' => 0.0299783424700051
dbg: bayes: token 'H*r:192.168.1' => 0.0332916024497639
dbg: bayes: token 'H*R:U*noreply' => 0.0884273751672186
dbg: bayes: token 'H*r:localhost' => 0.095748955695973
* the address/domain of the receiver is present in various combinations 6
times.... why is the receiver address so important?
dbg: bayes: token 'H*r:sk:<localpart>' => 0.00474205399064878
dbg: bayes: token 'HTo:U*<localpart>' => 0.00573965631120421
dbg: bayes: token '<localpart>@<domain>.it' => 0.0252948951857414
dbg: bayes: token 'U*<localpart>' => 0.0252948951857414
dbg: bayes: token 'sk:<localpart>' => 0.0252948951857414
dbg: bayes: token '<localpart><domain>' => 0.0252948951857414
* the 2 words of the subject are listed but Subject: is not tokenized
according to the sources
dbg: bayes: token 'INFORMAZIONI' => 0.0198930234212028
dbg: bayes: token 'importanti' => 0.0186572280369034
* the tokens with the highest score are (notice 0.97 to 0.12)
dbg: bayes: token 'assicurarti' => 0.97797086079613
dbg: bayes: token 'caro' => 0.125457833816543

Can you please tell me if my bayes engine is working as it should?

Reply via email to