Luigi> Once both are enabled it seems to work but the mail processing is
Luigi> very very slow.
>> First time through, yes. After that, it should (in theory) rely on
>> its cache of IP address information. I may have some pending
>> checkins for that though (*). Note also that a fairly small training
>> database works for me (fewer than 100 hams, 250-300 spams). If you
>> have a massive training database, then, yes, this will slow things
>> down dramatically. The IP lookup and image OCR stuff changes the
>> properties of your database enough that I think it's worth retraining
>> from scratch.
Luigi> I have tried on a sample of 5000 emails but I stopped it because
Luigi> after more than half an hour it didn't finish. From tcpdump I
Luigi> could see a request every 1,2 seconds (or something like that)
Luigi> now even considering that not every mail contains an url it was
Luigi> very slow. As a note I tried it on windows XP with ocr scanning
Luigi> enabled but ocr alone was much faster.
I can't imagine a scenario where I would need 5000 emails to get decent
results with SpamBayes. If that was the common case, everyone would give up
on it long before it was of any use. I still suggest you try starting from
scratch.
Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html