Hi, I'm using the train-to-exhaustion script, and it seems to be taking a seriously long time to process some of the messages.
It turns out that using -o Tokenizer:x-lookup_ip:True causes a serious hit to training speed. I do have Tokenizer:lookup_ip_cache set. Not only that, but it seems to go slowly even on the second and subsequent training passes, by which time I'd think the cache would be full. So I'm wondering if the cache is really working, or if it's size-limited so that it's blown by my large training set, or if there's some other issue. I notice that the cache—as integrated into SpamBayes—doesn't support selecting a timeout other than “10” nor does it support choosing the DNS server, even though the cache class itself allows that to be tuned. I don't have any good reason to think either of these are the problem. Any insight you can offer would be very much appreciated. Thanks, -- Dave Abrahams Meet me at BoostCon: http://www.boostcon.com BoostPro Computing http://www.boostpro.com _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
