Luigi> Once both are enabled it seems to work but the mail processing is
    Luigi> very very slow.

    >> First time through, yes.  After that, it should (in theory) rely on
    >> its cache of IP address information.  I may have some pending
    >> checkins for that though (*).  Note also that a fairly small training
    >> database works for me (fewer than 100 hams, 250-300 spams).  If you
    >> have a massive training database, then, yes, this will slow things
    >> down dramatically.  The IP lookup and image OCR stuff changes the
    >> properties of your database enough that I think it's worth retraining
    >> from scratch.

    Luigi> I have tried on a sample of 5000 emails but I stopped it because
    Luigi> after more than half an hour it didn't finish. From tcpdump I
    Luigi> could see a request every 1,2 seconds (or something like that)
    Luigi> now even considering that not every mail contains an url it was
    Luigi> very slow.  As a note I tried it on windows XP with ocr scanning
    Luigi> enabled but ocr alone was much faster.

I can't imagine a scenario where I would need 5000 emails to get decent
results with SpamBayes.  If that was the common case, everyone would give up
on it long before it was of any use.  I still suggest you try starting from
scratch.

Skip

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to