Hello John, Friday, August 26, 2005, 6:25:14 AM, you wrote:
JH> Hello, JH> We have had a complaint from a user that some of his Japanese mail JH> (being received by us) is always marked by SA as spam. As a University JH> it is natural for us to receive foreign mail messages. Understood. JH> X-Spam-Status: Yes, score=13.7 required=8.0 tests=BAYES_99,HTML_20_30, JH> HTML_MESSAGE,MANGLED_LOOK,SARE_HTML_P_MANY3,SARE_RAND_2, JH> SARE_RECV_IP_218216,SARE_SUB_ENC_ISO2022JP,SARE_SUB_PCT_LETTER, JH> SUBJ_ALL_CAPS autolearn=unavailable version=3.0.4 JH> Unfortunately at the time I had left included in our site-wide JH> configuration some of the specific 'ENG' SARE rules, so that explains JH> the SARE_SUB_ENC_ISO2022JP matching and bumping the score up a bit. The JH> SARE_RECV_IP_218216 is also a bit worrying (the message may have passed JH> through a known spam relay). If you're using the latest SARE version, SARE_RECV_IP_218216 should be scoring only 0.964, because we have detected ham coming through that range of servers (though spam:ham > 100:1). If you can send me some confirmed ham (full emails, headers and all), I can add those to my corpus and that will help drive the score down. MANGLED_LOOK is the larger concern, with a score of 2.3. Like the ENG rules, the MANGLED rules file should not be used if you expect any significant non-English ham. I would remove that file from your collection. The 70_sare_obfu*.cf file set is slowly replacing MANGLED, and seems to be successful in avoiding most language problems. SARE_RAND_2 also scores 2.5 -- That tests for a specific string suggesting that a broken ratware configuration inserted something like %RND into the email. I suppose it's possible, but it seems unlikely that the Japanese email would match that pattern. If you can send me the exact email which does so, maybe I can track that down. SARE_HTML_P_MANY3 scores only 0.217, so that's not much of a concern. SARE_SUB_PCT_LETTER with a score of 1.152 is also a significant contributor, matching a percent sign, followed by a single letter, then word break. There is no percent sign in the raw subject you posted, so I assume it's in the code after translation. Seems strange. Again, a copy of that exact email would help me analyze this. The biggest concern, as Matt pointed out, is your BAYES_99. If this is indeed ham, then you need to train these ham, because your Bayes system believes firmly that these are spam. Bob Menschel