Am 07.07.23 um 17:04 schrieb Richard:
 I've FINALLY built up a "corpus" of ham vs spam and also FINALLY had some
 time to spend on this and just ran sa-learn on, oh, IDK, some 10k email
 messages or so, I'd guess. And along the way, I NEVER ONCE got the kind of
 output response back from sa-learn that I expected.

 For example, here I run it against a file containing just over 2100 spam:

 $ sa-learn -u richard --spam spam
 Learned tokens from 0 message(s) (0 message(s) examined)

 (I was running it as root - which the docs don't mention but I figure is
 what I'm supposed to do!)

why do you suppose that?

...Uh... Because otherwise why the -u flag and comments about running it for virtual users?

you NEVER run anything as root which isn't a root task - no matter what

you run it with the same user you spamd is running

Good to know! ...I'd recommend an update to the doc / web page to point out it should be run as the user ID of whatever spamd is using!

Now, I'd guess I should, as root:

sa-learn --clear

Since I hadn't run sa-learn before, EVER, that I was aware of!

...And THEN run as I've just learned. And, BTW, this makes me happy I scripted calling sa-learn, so re-doing this will be easy!

As an aside, "curating" modern ham from my inboxes is time consuming so a lot of the ham I used is older, from saved folders... I saw the warning about old vs new, and the potential effects of that; as my inboxes typically have around 2k messages in them, and going through and making sure NONE are spam is time consuming, is it worth tossing in a few at a time from recent days, such as a day at a time?

...My guess is that nobody can really say what the Bayesian system is going to pick up on exactly, so YES, it can't hurt?!

Thanks,
Richard

Reply via email to