"Matt Yackley" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]

>> What I've resorted to doing is to archive all incoming e-mail in my
>> front-end MTA (exim on linux). I have a script that extracts the
>> message-ids of the spam in the Exchange public mail box and then
>> extracts the original e-mails from the archive to feed through sa-learn.

> That was one of the ideas we came up with as a work-around to the problem, 
> but had
> not really looked into it yet.  Can you share the script that you are 
> using to pull
> the message-ids out of Exchange?

Sadly 'script' is rather a grand term. I haven't got around to automating 
things properly, so I manually do the following roughly once a week or so:

# get a list of message-ids from the imap folder
$ grep -i "^message-id: " imap-folder | sed -e 's/.*<\(.*\)>.*/\1/' > msgids

# get e-mails from the current archive folder
$ cat /var/local/archive/mail | spamlearn msgids > mbox

# get e-mails from the last 3 weeks of archives
$ zcat /var/local/archive/mail.[123].gz | spamlearn msgids >> mbox

# feed to SA
$ sa-learn --spam --mbox < mbox


The archive I keep is rotated once a week and gzipped. The perl script, 
spamlearn, is attached. This just scans stdin for messages in the supplied 
file and writes them to stdout. Any that it finds it removes from the 
supplied file. Hopefully, by the end, the file 'msgids' is empty.

This could obviously be vastly improved and automated - I just haven't got 
round to it yet.

regards,
John 


begin 666 spamlearn.dat
M(R$O=7-R+V)I;B]P97)L"@IU<V4@<W1R:6-T.PH*;7D@)6US9VED<SL*;7D@
M)'-T871E.PH*;7D@)%-4051%7U-405)4(#T@,#L*;7D@)%-4051%7TA%041%
M4E,@/2 Q.PIM>2 D4U1!5$5?3U544%545$E.1R ](#(["FUY("135$%415]4
M2%)/5TE.1U]!5T%9(#T@,SL*"B,@;&]O:R!A="!A<F=U;65N=',*:[EMAIL PROTECTED]'-C
M86QA<B! 05)'5B A/2 Q*2!["@EP<FEN=" B57-A9V4Z('-P86UL96%R;B!M
M<V=I9'-<;B(["@EE>&ET(#$["GT*"FUY("1M<V=I9'-?9FEL92 ]('-H:69T
M($!!4D=6.PII9B H(2 M92 D;7-G:61S7V9I;&4I('L*"7!R:6YT("(D;7-G
M:61S7V9I;&5<.B!F:6QE(&YO="!F;W5N9%QN(CL*"65X:70@,3L*?0H*(R!G
M970@;&ES="!O9B!M<V=I9',*;W!E;B!-4T=)1%,L("(\("1M<V=I9'-?9FEL
M92(@;W(@9&EE(")5;F%B;&[EMAIL PROTECTED]&\@;W!E;B D;7-G:61S7V9I;&5<.B D(5QN
M(CL*=VAI;&[EMAIL PROTECTED])1%,^*2!["@EC:&]M<#L*"21M<V=I9'-[)%]](#T@
M,3L*?0IC;&]S92!-4T=)1%,["@IM>2! 8W5R<F5N=%]M<V<["B1S=&%T92 ]
M("135$%415]35$%25#L*"G=H:6QE("@D7R ](#P^*2!["@EI9B H)'-T871E
M(#T]("135$%415]35$%25"D@>PH)"6EF("@O7D9R;[EMAIL PROTECTED]@>PH)"0DD<W1A
M=&4@/2 D4U1!5$5?2$5!1$524SL*"0D)0&-U<G)E;G1?;7-G([EMAIL PROTECTED]"1?*3L*
M"0E]"@E]"@EE;'-I9B H)'-T871E(#T]("135$%415](14%$15)3*2!["@D)
M:[EMAIL PROTECTED]"]>365S<V%G92U)1#I<<RH\*"XJ*3XO:2D@>PH)"0EP=7-H($!C=7)R
M96YT7VUS9RP@)%\["@D)"6UY("1M<V=I9" ]("0Q.PH)"0DC('!R:6YT(%-4
M1$524B B365S<V%G93H@)#$@(CL*"0D):[EMAIL PROTECTED]"1M<V=I9'-[)&US9VED?2D@
M>PH)"0D)<')I;G0@(B1?(B!F;W)E86-H("A 8W5R<F5N=%]M<V<I.PH)"0D)
M)'-T871E(#T@)%-4051%7T]55%!55%1)3D<["@D)"0ED96QE=&4@)&US9VED
M<WLD;7-G:61].PH)"0D)(R!P<FEN="!35$1%4E(@(E-P86U<;B(["@D)"7T*
M"0D)96QS92!["@D)"0DD<W1A=&4@/2 D4U1!5$5?5$A23U=)3D=?05=!63L*
M"0D)"2,@<')I;[EMAIL PROTECTED])2(")/2UQN(CL*"0D)?0H)"7T*"0EE;'-I9B H
M+UY<<RHD+RD@>PH)"0DD<W1A=&4@/2 D4U1!5$5?5$A23U=)3D=?05=!63L*
M"0E]"@D)96QS92!["@D)"7!U<V@@0&-U<G)E;G1?;7-G+" D7SL*"0E]"@E]
M"@EE;'-I9B H)'-T871E(#T]("135$%415]/5510551424Y'*2!["@D):68@
M*"]>1G)O;2 O*2!["@D)"21S=&%T92 ]("135$%415](14%$15)3.PH)"0E 
M8W5R<F5N=%]M<V<@/2 H)%\I.PH)"7T*"0EE;'-E('L*"0D)<')I;G0["@D)
M?0H)?0H)96QS:[EMAIL PROTECTED]"1S=&%T92 ]/2 D4U1!5$5?5$A23U=)3D=?05=!62D@
M>PH)"6EF("@O7D9R;[EMAIL PROTECTED]@>PH)"0DD<W1A=&4@/2 D4U1!5$5?2$5!1$52
M4SL*"0D)0&-U<G)E;G1?;7-G([EMAIL PROTECTED]"1?*3L*"0E]"@E]"GT*"B,@<F5W<FET
M92!M<V=I9',*;W!E;B!-4T=)1%,L("(^("1M<V=I9'-?9FEL92(@;W(@9&EE
M(")5;F%B;&[EMAIL PROTECTED]&\@;W!E;B D;7-G:61S7V9I;&5<.B D(5QN(CL*9F]R96%C
M:" H:V5Y<R E;7-G:61S*2!["@EP<FEN="!-4T=)1%,@(B1?7&XB.PI]"F-L
5;W-E($U31TE$4SL*"F5X:70@,#L*
`
end


Reply via email to