On Sun, 2009-08-02 at 04:36 -0700, an anonymous Nabble user wrote:
> I changed the value on "1"(I use this for testing and my self-learning its
> my homework). According to me - spam bayes learning was activated. When I
> use sa-learning so bayes learn that the mail is spam. And bayes learn the
> signatures...
> 
> Therefore is for me strange when I send the same mail again so bayes dont
> mark this mail like spam? I dont understand this. I realize all conditions -
> sa-learn  --spam --file  mail. "bayes_min_spam_num 1". The date the databaze
> was too changed(but the size stay the same). nspam was increased... I really
> dont understand what use is SA-LEARN! I have feel that the bayes dont work
> correctly- bayes ignore sa-learn. I am perhaps silly but I dont understand
> how it works:(( I am interesred how tell to bayes THIS MAIL IS SPAM(by using
> sa-learn), WHEN THIS SAME MAIL COME AGAIN SO YOU HAVE TO MARK LIKE SPAM! I
> know that bayes find similar element between mail and according to decide.
> But when I mark mail like spam a next mail have 100% similarity so bayes
> HAVE TO mark it like SPAM. It is logical acording to me.

Nope.  This is wrong. Bayes does not know the concept of a message, or
them being equal. It knows tokens.

Consider the following. Your have 100 ham messages that contain the word
'foo' somewhere in the body, and you learn these messages as ham. You
then learn a spam message that contains the word 'foo' as its only Bayes
token. (Won't happen in reality, this is a stripped down example. ;)

So Bayes, a statistical analyzer, knows that 'foo' is a rather hammy
token with 100 sightings, and only rarely observed in spam. A single
time.

If you then ask Bayes for its opinion about the very same, just learned
spam message containing 'foo' as its only Bayes token, it will tell you
that it's *ham* with a very high confidence.


Please show us the output of 'sa-learn --dump magic'.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to