At 01:24 AM 6/22/04 +0200, Matthias Keller wrote:
As I'm pretty much hand-feeding every spam my bayes will ever see I noticed (and I think that's worse now in 2.63 than it was in 2.55) that sa-learn always needs ages just to perform a
cat spam.txt | sa-learn --spam

Why are you abusing pipes like that? sa-learn can accept filenames, no need to pipe things to it.


sa-learn -spam spam.txt

It's possible for sa-learn to be more efficient that way too, since it will be able to seek the input file any way it wants without having to buffer the whole thing. (it may or may not do this, but it does have the extra flexibility at it's disposal)


So, does sa-learn really have to load all the rules as I suspect just for training its bayes or why does it grow up to such an enormeous size? (and yes I DO have a lot of rules loaded including blacklist.cf et al but I dont see why sa-learn would NEED those...)

I can't see why it would need them, although it probably does load them as it MUST parse all of your config files anyway (otherwise it might miss a bayes_path directive). It'd take extra code to have some kind of flag to make the parser discard rules and only load bayes directives.


(note: sa-learn might ignore rules, I've not checked, but I can definitely understand if it does parse them from a code simplicty and bug reduction standpoint)



Reply via email to