I'm looking to modify spambayes to use 5-grams rather than 
split-on-whitespace.  We have a few Asian customers and the default 
spambayes setup has not been very effective for them.  So, we want to 
test with 5-grams and see if we can improve the effectiveness.

I know that n-grams have been tested several times before.  So, if 
anyone has a n-gram tokenizer that they can share, I would appreciate a 
copy.  Otherwise, I'll dive in and write it myself.

Thanks.

Richard Coleman
[EMAIL PROTECTED]
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to