On Apr 16, 2007, at 9:34 PM, Matt Kettler wrote:
Try to learn it, if it comes back with something to the affect of:
"learned from 0 messages, processed 1.." then it's already been learned.

this seems to be the common suggestion.

it has a couple drawbacks, as i see it:

1. it's relatively cpu-intensive if i want to do it all the time (e.g. scan my spam folder to learn only the messages which haven't already been learned)

2.  which way do i learn it.

to step back a bit, my final goal is to be able to figure out which messages in a folder haven't been learned, and learn only those. in the ideal situation i can also figure out (ahead of time), whether a learned message was learned as ham or spam.

this may be semi-impossible.

on the other hand, what can i learn from the headers?

e.g. it looks like autolearn=[something] will tell me about the autolearner, but is there anything for manual learns?

where i'm going with all this:

i can run a cron job to learn the contents of different mailboxes on a regular basis. what i do now is have a TrainSpam and TrainHam mailbox, and when something gets misfiled (in Spam or any ham folder) i just move it in there. every 5 minutes a cron job goes through and scans things appropriately. <http://www.faisal.com/software/sa- harvest/quicktrain.html>

first, i'd like to be able to do that within the mailboxes rather than using special mailboxes.

second, i'd like to be able to key off junk mail flags set by the client (thunderbird, apple mail). i'm using dovecot, so it's a fairly simple matter of parsing Maildir filenames, but to do it right i need to combine the knowledge with what spamassassin thinks.

i might just go write a dovecot plugin to do this in real-time, but i'm not feeling the motivation to break the mail server with a misplaced pointer.

-faisal

Reply via email to