-----BEGIN PGP SIGNED MESSAGE-----
>>>>> "ms" == Michael Shiloh <[EMAIL PROTECTED]> writes:
ms> Yes, please post your perl script, along with the invocation
The script is attacheds, it takes no args and makes the assumption
that the message is either a message/rfc822 attachment which is either
(1) already labelled incorrectly as spam in which case there is a
Content-description header which says "original message before
SpamAssassin", or
(2) incorrectly tagged as ham in which case the Content-description
doesn't match.
For the first case, the inner message is extracted to stdout. For the
second case, the outermost message is extracted to stdout. It will
also work if you feed the message in directly, e.g., in mbox format.
I haven't yet installed it because I am still trying to understand how
to bypass the extra SA processing when I forward the mail internally
to the learning account. All my mail goes through a procmail router,
so adding the script should be as simple as adding a procmail rule
something like this:
:0
* ^TO_.*notspam
| $SOMEPATH/sa-extract.pl | sa-learn -ham
regards,
roland
- --
PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
[EMAIL PROTECTED] 6818 Madeline Court
[EMAIL PROTECTED] Brooklyn, NY 11220
#! /usr/bin/perl -T
use File::Path;
use MIME::Parser;
use warnings;
# Create a temporary directory for our files. Make it private.
my $username = (getpwuid $<)[0];
my $dirname = "/var/tmp/$username.$$";
mkpath($dirname, 0, 0700) or die $!;
# Parse the message and dump the parts.
my $parser = new MIME::Parser;
$parser->output_under("$dirname");
my $entity = $parser->parse(\*STDIN);
# The message may have more than one part; one way this can happen is
# if the sender has a signature automatically appended which will add
# a text/plain part. So we allow up to two parts.
if ($entity->parts > 2) {
die "Too many parts: " . $entity->parts;
}
# Even if there are two parts, we expect the first one to be the
# attached message/rfc822 to be scanned.
if ($entity->parts(0)->mime_type ne 'message/rfc822') {
warn "part type not message/rfc822: " . $entity->parts(0)->mime_type;
}
# Heh, heh. I lied, It is possible that the first part is actually foa
# spamassassin multipart/mixed that contains the message as
# message/rfc822 attachment due to enabling report_safe. So look
# inside the part to see if it has a message/rfc822 part. The marker
# for this will be a message whose Content-description is 'original
# message before SpamAssassin'
my $p = &walk_parts($entity) or die "no embedded message";
$p = &walk_parts($entity, 1) || $p;
# Print the message
foreach $L (@{$p->body}) {
print $L;
}
# Clean up temporary directory tree.
rmtree($dirname, 0, 0);
exit (0);
#
sub walk_parts {
my $mime_part = shift;
my $look_at_content_description = shift;
my $num_parts = $mime_part->parts;
foreach $i (0..$num_parts-1) {
if ($mime_part->parts($i)->mime_type eq 'message/rfc822') {
my ($head, $descr);
if (!$look_at_content_description) {
return $mime_part->parts($i);
} elsif (( ($head = $mime_part->parts($i)->head))
and ($descr = $head->get('Content-description'))
and ($descr =~ m/original message before SpamAssassin/)) {
return $mime_part->parts($i);
}
}
# Recurse on sub-parts returning the first hit
my $q = &walk_parts ($mime_part->parts($i),
$look_at_content_description);
if (defined ($q)) {
return $q;
}
}
return undef;
}
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface
iQCVAwUBQSIOr+oW38lmvDvNAQFObgQAvrRhIV+/cNwRN9EIo4BTLDxf4Ei14iDH
AIQkCGMRqwPGB/O1yXMpmjW46P59rNS3kmEBmm/rkgi5r6ZklPK/w3EVst0k5ejU
APpZuW9FdodlYEvQ8VjUtkDkV4J6E4FOEmrKeFOhiTDEzgCI4/vFqThY7xYbQSHA
b7lUKm4gfS0=
=jZyl
-----END PGP SIGNATURE-----