On 07/29/2010 12:09 PM, RW wrote:
>
> I wrote a patch last week (which I've attached) to add country pairs as
> separate token metadata  e.g.
>
> X-Spam-Relay-Countries: US US CA NG
> X-Spam-Relay-Country-Tokens: Trusted_US USCA CANG
>
> It's not a straight fix, but I'll submit it if no-one has a better idea.
If you are trying to get Bayes to "learn" that those particular *words*
are associated with ham/spam, shouldn't you choose more unique strings
(since you can)? Otherwise couldn't Bayes misclassify when such words
show up as part of email messages?

e.g "SA_RELAYCOUNTRY_US" for "US" would basically ensure hits on "us"
never gets merged in with the counters/whatever for
"X-Spam-Relay-Countries: US"

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Reply via email to