Re: Gateways, analyze first, insert into bayes later ?

Matt Yackley 12 Apr 2005 02:43:45 -0000

Hi Herold,

Are you using a sitewide bayes DB?  This may affect your choice of solutions, 
I'm
running sitewide, so my method may not work if you are using seperate DBs for 
all
your users....


Herold Heiko said:
> Newbie Alert - New to Spamassassin. Pondering enhancement to my current
> basic setup, which is a filter gateway in front of MS exchange.
> Filter gw is amavisd-new + dual-sendmail-setup + clamav+spamassassin 3.02.
>
> I'm looking how to feed back sorted spam/ham info into the spamassassin
> bayes database, skimming through the list archives I basically found people
> talking about some different possibilities I basically was thinking about,
> too:
>
> - feed msgs back the spam/ham with a "forward".

If you have to go with a "forward" option it would be best to "forward as
attachemnt" which would preserve the headers, but then creates an issues of
"unwrapping" the attached message, I seen this mentioned many times, but have 
never
seen a script to do this :(

>
> - Have users sort Spam (and wrongly marked Ham) in different folder, attach
> with CDO or OLE automation of outlook. Users are happy, but the whole
> message would need reconstruction based on original headers, body and
> attachments, losing valuable information.

I use a public folders for message submission, users can see the folders, create
messages in them, but can't view or change the contents.  At first we had the 
users
drag and drop messages into these folders, but navigation is a bit of a pain. 
Instead I workedtalked with a dev here at work and he wrote a small plugin for
Outlook that adds a "Learn as spam" and "Learn as Ham" button to the main 
toolbar in
Outlook.  The spam button "moves" a message to spam folder and the "ham" button
copies the message.  Its quick and easy for the users and has been working well 
for
us, now I just need to time to document it a bit and release it for others to 
use. 
Now on to the other issues... :)

> - Have users sort Spam and Ham in different folder, extract with IMAP. Users
> are happy, headers should be fine, but still I think the original encoding
> used for body and attachments are lost, what we feed back to sa-learn is a
> freshly reencoded (by exchange) mail.

Are you thing of having the users "push" the messages to the relay server or 
pulling
the message out of Exchange from the relay server?

Extracting messages from public folders via IMAP is somewhat broken in Ex 2000 &
2003, not sure about 5.5.  It tend to drop all headers except for received, 
date,
subject and inserts some of its own.  This isn't good, but my bayes still works
pretty darn well.  (I have a ticket open with MS about this)

> Anybody with more knowledge of the working of Spamassassin can tell me if
> the loss of the original encoding of body and attachments is a VERY BAD
> THING ?

I don't believe that bayes will process attachments in 3.x and above, the 
encoding
may change somewhat, but hopefully the majority of messages will be ok.  So I 
would
say its a bad or a not so good thing, but not a very bad thing...overall

> If it is, I was thinking, Spamassassin did already analyse all those
> (inbound) messages the first time when delivered.
snip
>
> So we could save that information (for some time... say a couple of weeks,
> depends on size and so on) using the message-id as a key.
> Later then instead of sa-learn -spam <path_to_spam_msg we could retrieve
> that info (extract the msg-id from the headers, retrieve analyze data from
> db) and feed it back.

This is something that I have talked about with the dev at work.. perhaps use 
amavis
or postfix (in my case) to save a copy of all messages, then write something to 
pull
the msg ID out of submitted messages and then pull the "original" out of the 
"raw
message store" on the relay server.  If MS can't fix my IMAP header issue, then 
we
may look at trying to write something.

> Anybody with better knowledge of the internal workings of SpamAssassin could
> tell me
> - if this is even necessary / useful ? After all I AM a newbie in this area,
> maybe there is some other easy way I didn't spot yet, OR the loss of the
> original encoding is not so important

I'll have to let someone else who knows more answer that one.


> Thanks
>
> Heiko Herold

If you want to go the public folder route, be sure to check out Nick Burch's
power-imap-sa-learn script. http://tirian.magd.ox.ac.uk/~nick/code/

Cheers,
matt

Re: Gateways, analyze first, insert into bayes later ?

Reply via email to