You sent the message to the list: Received: from [202.154.34.135] (HELO v6.i6x.org) (202.154.34.135) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Aug 2005 22:59:21 -0700
The spam message header you showed: > Date: Wed, 31 Aug 2005 08:59:56 -0700 The Date header on that mail is some 9 hours after the time you posted your question. Hence: > * 1.3 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after > Received: date Assuming you sent the mail to the list not long after you received it, the Date header on that mail still shows it being between 6 and 12 hours in the future from when you received it. > 3. I have train hundreds (or thousands) spam/ham mail to sa-learn but it > seems it still not quite good detecting non-english mail. > * 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% > * [score: 1.0000] This tells me that bayes is 100% sure that the message is spam. That sounds pretty good to me, unless this isn't spam. However, the date header being mucked up, and the date header and first Received headers showing timezones that are 12 hours apart, makes me think this is spam. SA is written primarily by English speakers, and the rules are primarily aimed at detecting English-language spam. There are some add-on rulesets to detect spam in other languages, but they generally aren't that well maintained. They would have to be maintained by contributions from people who can write rules for spam in other languages. Few people that might be able to write such rules seem to contribute them. Bayes should be pretty good about detecting spam in most languages that do not require double-byte characters. The current release of SA may have some problems with double-byte characters that could make Bayes less effective than it could be. Loren