I looked through the source and it looks like that NaN value in the header is calculated as (p / (p + np)) where p ==> [p *= (token rating)] np ==> [1.0 - (token rating)]
which to me, indicates a token rating outside of the 0.0 - 1.0 range, happening during the training period. Sounds weird - you may want to enable debug, rebuild James, and re-process some of your spam emails. For debug I would capture the value put into the map in addTokenOccurances() around line 400: target.put(token, value); It is also interesting that you said all of your headers contain the same header values > X-Spam-Score: -2.6 > X-Spam-Report: -2.6 BAYES_00 BODY: Bayesian spam probability > is 0 to 1% > [score: 0.0000] and if these exist during the Spam training, i.e. 100% of your example Spam emails contain these tokens, perhaps this is tainting the token ratings? Idea + wild guess = HTH Kent -----Original Message----- From: David Legg [mailto:david.l...@searchevent.co.uk] Sent: Tuesday, February 10, 2009 1:57 PM To: James Users List Subject: Re: How do I reduce SPAM > Here is what I see as well; ( on ALL messages) > > X-MessageIsSpamProbability: NaN > X-MessageIsSpam: true > Mmm... Ok. Well, as you may know 'NaN' is short for 'Not a Number' in floating point speak. So something has caused the spam probability value to be such a large or small number that Java can't represent it. I've seen one or two of my own messages with this value... but not all of them. Tell me... do your emails contain lots of images? I've noticed in the past that the Bayesian filter will quite happily chomp its way through all the image data and treat it as if it were text. If you had lots of this type of email I could believe it might effectively poison the corpus. I'm beginning to clutch at straws now as I don't know what else to suggest... Anybody else got any ideas? Regards, David Legg --------------------------------------------------------------------- To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org For additional commands, e-mail: server-user-h...@james.apache.org Internal Virus Database is out of date. Checked by AVG - http://www.avg.com Version: 8.0.233 / Virus Database: 270.10.17/1932 - Release Date: 2/3/2009 7:57 AM --------------------------------------------------------------------- To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org For additional commands, e-mail: server-user-h...@james.apache.org