On Wed, 2013-05-08 at 14:09 -0400, Andrew Talbot wrote:
> Well, I certainly hope someone offers to help! 

Heh! I am really confident, Alex didn't mean to be rude, neither that he
actually hopes no one will help you. Quite the contrary...

He DID try to help you by explaining why a "default Bayes database" is a
bad idea in the first place. And that was his way of telling you...

> If only to say "there is no default database." 

That. :)  There is none, and there never has been.


> As we've spoken about off-list, my boss is being very particular about the
> deployment of Bayes, and it sounds like one of his caveats is that we don't
> start from a blank database. 

I can see how the idea of basing off of some "known to be classified"
tokens sounds tempting. However, there is no such token. None. Just try
to imagine working in an industry where e.g. Viagra and Cialis are
totally legit phrases to use...

Feel free to direct your boss here. If he insists on starting with a
pre-populated Bayes database, he sure knows why. Other than "I'm the
boss, I want."


Anyway, Andrew, your idea of that whole "blank slate" is inaccurate. If
you import someone else's data, before importing your database has been
empty.

If you collect some ham and spam for initial training, before training
your database has been empty.

You even do NOT have to deploy SA prior to that. I don't know the size
of your user base, but it seems it shouldn't be hard to have a few of
the users chip in. Get a few of them to collect hand-classified ham and
spam for you. Train Bayes with that. After that, deploy SA to your mail
processing chain.

There you go! A pre-populated Bayes database, based on YOUR particular
ham and spam tokens, before deploying SA in production.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to