On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote:
> > In other words is there something like a gaussian distribution
> > graphic visualisation?
> 
> That would be different on every server depending on what type of spam 
> and ham you see and which rule sets you are running.  I graphed mine out 
> of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
> at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
> some reason (and a rather large spike at 0).

I don't think that second spike is odd. That's the majority of your ham.

Since the data-set includes both spam and ham combined, there are two
spikes to be expected. A single bell curve would mean too many messages
in the gray area, no clear distinction between ham and spam, and
consequently lots of false positives and negatives.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to