On Mon, 2 Aug 2010 05:51:25 -0700 (PDT)
andrij wrote:
> How many tokens are used by the SA's bayes classifier to
> calculate the probability that the mail is spam/ham?
It varies. It uses all the tokens above a minimum token strength, up to
a maximum of 150.
On Mon, 2 Aug 2010 05:29:32 -0700 (PDT)
andrij wrote:
>
> Hi all,
>
> after I trained the bayes classifier with several thousands of
> e-mails I run "sa-learn --dump magic" and I got the following:
>
> Why was the number of ntokens not reduced to 15?
The expiry algorithm isn't very good.
On 8/2/10 7:53 AM, "Daniel Lemke" wrote:
>
>
> Yet Another Ninja wrote:
>>
>> compiled rules only affects body & rawbody rules.
>> Network tests won't be affected and are probably the reason for the lack
>> of a massive difference.
>>
>
> Good advice, I disabled all the other plugins and ran
On Mon, 2010-08-02 at 05:53 -0700, Daniel Lemke wrote:
> Yet Another Ninja wrote:
> > compiled rules only affects body & rawbody rules.
> > Network tests won't be affected and are probably the reason for the lack
> > of a massive difference.
>
> Good advice, I disabled all the other plugins and r
Daniel Lemke wrote:
>
>
> andrij wrote:
>>
>> I run the bayes classifier on more than 4500 e-mails. All (except of cca
>> 100 e-mails) contained test=BAYES_*. Does anybody have any idea why these
>> 100 e-mails were not scored by the bayes classifier?
>>
>
> Do you have any shortcircuit enab
andrij wrote:
>
> I run the bayes classifier on more than 4500 e-mails. All (except of cca
> 100 e-mails) contained test=BAYES_*. Does anybody have any idea why these
> 100 e-mails were not scored by the bayes classifier?
>
Do you have any shortcircuit enabled?
Could you post a raw example of
Yet Another Ninja wrote:
>
> compiled rules only affects body & rawbody rules.
> Network tests won't be affected and are probably the reason for the lack
> of a massive difference.
>
Good advice, I disabled all the other plugins and ran spamassassin in local
test mode, processing a huge text
Hi all,
I run the bayes classifier on more than 4500 e-mails. All (except of cca 100
e-mails) contained test=BAYES_*. Does anybody have any idea why these 100
e-mails were not scored by the bayes classifier?
At http://www.paulgraham.com/spam.html, it is written that "When new mail
arrives, it is
Hi all,
after I trained the bayes classifier with several thousands of e-mails I run
"sa-learn --dump magic" and I got the following:
0.000 0 3 0 non-token data: bayes db version
0.000 0 5367 0 non-token data: nspam
0.000 0 3792