>>> The word lengths in Dutch are somewhere between those of >>> English and German. Is this a "configurable"? >> >> Not trivially, but it's not too hard either. Look toward the >> bottom of >> spambayes/tokenizer.py where there are a couple comparisons of n >> to 3. I >> can't quote you the correct chapter and verse because I'm using a >> version >> of tokenizer.py modified in just that region and SourceForge >> appears to be >> on-the-blink at the moment. It should be fairly easy to understand. > > OK, I'll unleash my vi-fu and give it a try.
Please let us know if it does appear to help. It would be trivial to make it an option (the opposite end - skip_max_word_size - already is) if that would be something that helps users for whom English isn't their main email language. =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
