ceph...@3phase.com skrev den 2013-06-19 22:11:
Hi John,
See the following example:
http://pastebin.com/DAYJ7NnJ
Lots of style gibberish for sure, but it failed to hit your rule
(sa-update ran at 4am today so it should have picked up anything
published). I'm guessing it's the parentheses.
On 06/20/2013 01:34 AM, Amir 'CG' Caspi wrote:
On Wed, June 19, 2013 3:47 pm, Axb wrote:
SA's URIBL plugin doesn't and shouldn't look in the alt attribute.
Why not, exactly? I wouldn't look at it for _all_ img tags, only for ones
that are clearly MailScanner-munged. That is, one would look
At 9:47 AM +0200 06/20/2013, Tom Hendrikx wrote:
Since mailscanner already has support for integrating spamassassin [1]
(As I mentioned explicitly in a previous email...)
why would you ever want to put work in reversing some of mailscanners
'protection'?
Because, given the particularls of
Amir 'CG' Caspi skrev den 2013-06-20 11:13:
BTW, I'm not talking about _actually_ reversing MailScanner's
protection. I'm talking about SA understanding enough to unmunge
the URI **for SA processing only**. The actual mail delivered to the
end-user would remain munged. SA would not be
Hi John,
See the following example:
http://pastebin.com/DAYJ7NnJ
Lots of style gibberish for sure, but it failed to hit your rule
(sa-update ran at 4am today so it should have picked up anything
published). I'm guessing it's the parentheses.
Whack the mole! =)
On 06/19/2013 10:11 PM, ceph...@3phase.com wrote:
Hi John,
See the following example:
http://pastebin.com/DAYJ7NnJ
Lots of style gibberish for sure, but it failed to hit your rule
(sa-update ran at 4am today so it should have picked up anything
published). I'm guessing it's the parentheses.
Another, nearly identical example I saw today , but which used trailing
slashes (/ or //) instead of parentheses.
http://pastebin.com/6XRwcjm3
Enjoy. =)
--- Amir
On Wed, June 19, 2013 2:11 pm, ceph...@3phase.com wrote:
Hi John,
See the
On Wed, June 19, 2013 2:33 pm, Axb wrote:
imo, it makes little sense to write rules to catch these hashbusters. As
If the rule is sufficiently broad, it will catch them. If the rule is so
strict that it catches only one trailing slash or something, then yes, it
makes little sense... but I think
On 06/19/2013 10:54 PM, Amir Caspi wrote:
Perhaps SA should include a module/plugin to unmunge MailScanner
munging? Has anyone written one, or if not, would anyone like to? ;-)
(Since MailScanner is open-source perl, I imagine it should be relatively
straightforward to find the munging code,
On Wed, June 19, 2013 3:14 pm, Axb wrote:
iirc, MailScanner munges the URL befor SA sees it so unless your plugin
idea involves a crystal ball, it's not possible.
Yes, MailScanner gets to it before SA does, unless SA is called from
within MailScanner (which it isn't, on my setup, but that is a
On 06/19/2013 11:30 PM, Amir 'CG' Caspi wrote:
Yes, MailScanner gets to it before SA does, unless SA is called from
within MailScanner (which it isn't, on my setup, but that is a possible
setup). However, the complete original URL is still contained within the
munged one. It's in the alt
On Wed, June 19, 2013 3:47 pm, Axb wrote:
SA's URIBL plugin doesn't and shouldn't look in the alt attribute.
Why not, exactly? I wouldn't look at it for _all_ img tags, only for ones
that are clearly MailScanner-munged. That is, one would look for the
patterns that MailScanner uses for
At 4:37 PM -0400 06/14/2013, Alex wrote:
On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote:
I wonder if there's some
difference between running spamassassin manually on the message versus
running spamd.
I think the only difference would be if spamd somehow didn't
On 6/18/2013 5:31 AM, Amir 'CG' Caspi wrote:
At 4:37 PM -0400 06/14/2013, Alex wrote:
On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com
wrote:
I wonder if there's some
difference between running spamassassin manually on the message versus
running spamd.
I think
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
At 10:48 AM -0700 06/17/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day
or so, since the new rules hit the distribution. So far, all TPs, no
FPs.
At 10:13 AM -0700 06/18/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
Any idea why it failed to hit, and does this need another rule revision?
Yep, and yep. Revision committed. Initial comment gibberish rule committed.
Thanks for the revision. Do you want to explain
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
a.) You are copying/pasting the body of the email, but not the headers.
No, I am copying the headers... however, I am using Eudora (ancient,
I know) as a mail client, and it's possible the headers are not
properly formatted. For example, for
On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote:
At 10:13 AM -0700 06/18/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
Any idea why it failed to hit, and does this need another rule revision?
Yep, and yep. Revision committed. Initial comment gibberish rule committed.
At 10:24 AM -0700 06/18/2013, John Hardin wrote:
The earlier version wasn't allowing for some punctuation in the
gibberish. There may be a period of whack-a-mole here, I was
conservative in the change I made.
Makes sense. Both of those examples are good for creating an
On 06/18/2013 07:24 PM, John Hardin wrote:
On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote:
At 10:13 AM -0700 06/18/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
Any idea why it failed to hit, and does this need another rule
revision?
Yep, and yep. Revision committed.
Amir 'CG' Caspi wrote:
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
a.) You are copying/pasting the body of the email, but not the headers.
No, I am copying the headers... however, I am using Eudora (ancient, I
know) as a mail client, and it's possible the headers are not properly
On 06/18/2013 07:18 PM, Amir 'CG' Caspi wrote:
Either way, I am _trying_ to copy the entire message. Not sure what is
misformatted there. If you take a look at my two pasted examples (links
below for convenience), those are direct copy/paste from Eudora's raw
source view. Any idea what is
On Tue, 18 Jun 2013, Axb wrote:
On 06/18/2013 07:24 PM, John Hardin wrote:
On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote:
At 10:13 AM -0700 06/18/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
Any idea why it failed to hit, and does this need another rule
On 6/18/2013 1:18 PM, Amir 'CG' Caspi wrote:
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
a.) You are copying/pasting the body of the email, but not the headers.
No, I am copying the headers... however, I am using Eudora (ancient, I
know) as a mail client, and it's possible the headers
Replies to multiple folks below...
At 1:42 PM -0400 06/18/2013, Kris Deugau wrote:
Try opening the on-disk file with Notepad (or your favourite text editor
on *nix). If you see the same thing you see when you hit the blah blah
blah button in Eudora, you should be OK. If not...
I've done
On Tue, 2013-06-18 at 11:18 -0600, Amir 'CG' Caspi wrote:
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
a.) You are copying/pasting the body of the email, but not the headers.
No, I am copying the headers... however, I am using Eudora (ancient,
I know) as a mail client, and it's possible
On Tue, June 18, 2013 1:01 pm, Martin Gregorie wrote:
The main thing I notice is that there are only two Received: headers,
and no envelope-From so IMO you're hoping for too much from the
header-related SA rules simply because there's very little for SA to get
its teeth into.
Well, I'm not
On Tue, 2013-06-18 at 20:01 +0100, Martin Gregorie wrote:
BTW, I just ran through 848 messages on this fairly average host (Lenovo
R61i [Intel Core Duo at 1.6GHz, 3GB RAM) running Fedora 18. The first
run averaged 1095 mS/message and the second averaged 96 mS/message, so I
don't think John's
Now I just have to figure out my Bayes problem...
Amir, When you do work that out, please let us know. We get LOTS of Spam
getting through and John said that it is the BAYES_00 which is causing the
problem. Restarting training seems a bit extreme. We cannot monitor every
hosted user,
On Tue, 18 Jun 2013 13:13:56 -0600 (MDT)
Amir Caspi wrote:
Well, I'm not really concerned about getting any header-related SA
rules to hit, for these tests. As I mentioned previously, my primary
concern right now is the disconnect between the Bayes score during
the automatic MTA delivery and
On Tue, June 18, 2013 4:36 pm, RW wrote:
One thing to watch out for is that a mailbox may contain hidden deleted
mail that remains there until the mail client compacts/expunges the
mailbox. For that reason I prefer explicit training folders rather than
folders where misclassified mails have
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around
trivial gibberish detection by putting a valid CSS property at the
very beginning of the style tag.
Revising the rules...
I am now seeing STYLE_GIBBERISH hitting on a lot of spam
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around trivial
gibberish detection by putting a valid CSS property at the very beginning
of the style tag.
Revising the rules...
I am now
On Mon, June 17, 2013 11:48 am, John Hardin wrote:
Well, that's a much harder problem. STYLE tags have a specified format,
and content not matching that format is (fairly) easy to detect. Comments
are freeform text - gibberish has the same meaning there that it does in
regular body text.
Hi,
On Mon, Jun 17, 2013 at 1:48 PM, John Hardin jhar...@impsec.org wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around trivial
gibberish detection by putting a valid CSS property
On Mon, 17 Jun 2013, Alex wrote:
Hi,
On Mon, Jun 17, 2013 at 1:48 PM, John Hardin jhar...@impsec.org wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around trivial
gibberish
Hi,
I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day
or so, since the new rules hit the distribution. So far, all TPs, no
FPs.
Yay!
I've also noticed the latest iteration hitting now quite a bit, but
also found an FP from groupon:
http://pastebin.com/qwdtSqJd
John Hardin skrev den 2013-06-17 20:52:
http://pastebin.com/qwdtSqJd
Well, that *is* gibberish in a STYLE tag. Bad coder, no biscuit.
If it persists I can add an exclusion for mail from groupon.com
Content analysis details: (-2.4 points, 5.0 required)
pts rule name
Hi,
On Mon, Jun 17, 2013 at 10:39 PM, Benny Pedersen m...@junc.eu wrote:
John Hardin skrev den 2013-06-17 20:52:
http://pastebin.com/qwdtSqJd
Well, that *is* gibberish in a STYLE tag. Bad coder, no biscuit.
If it persists I can add an exclusion for mail from groupon.com
Content analysis
At 10:48 AM -0700 06/17/2013, John Hardin wrote:
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the
past day or so, since the new rules hit the distribution. So far,
all TPs, no FPs.
Yay!
But, I found one today that should have hit
On Fri, 14 Jun 2013, Alex wrote:
http://ruleqa.spamassassin.org/20130613-r1492572-n/STYLE_GIBBERISH/detail
John, I've just tried with your latest, and his sample doesn't hit
STYLE_GIBBERISH. Any suggestions?
Hmm. I created an HTML message with a series of words in the style tag and
it did
On Thu, 13 Jun 2013, Alex wrote:
Hi,
On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote:
On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote:
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Hi,
On Fri, Jun 14, 2013 at 9:51 AM, John Hardin jhar...@impsec.org wrote:
On Thu, 13 Jun 2013, Alex wrote:
Hi,
On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote:
On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote:
Lately, I've been getting hit with a LOT of this type of
At 9:43 PM -0400 06/13/2013, Alex wrote:
I'd say if you have any that are hitting bayes20 or lower, your
database is not working properly and you should probably start over.
Not quite sure I want to do that... I don't really have a sufficient
corpus of mail for good training. It's working
Hi,
On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote:
At 9:43 PM -0400 06/13/2013, Alex wrote:
I'd say if you have any that are hitting bayes20 or lower, your
database is not working properly and you should probably start over.
Not quite sure I want to do that... I
At 4:37 PM -0400 06/14/2013, Alex wrote:
Yeah, but not bayes20. That's bad for sure. You should start
collecting now, or pull a few hundred from your recent quarantine and
use those, along with people's mail folders.
Well, I got bayes99 when I ran spamassassin manually just now. So, I
really
On Fri, 2013-06-14 at 16:37 -0400, Alex wrote:
The rules definitely exist on my system. I wonder if there's some
difference between running spamassassin manually on the message versus
running spamd. The message I pasted was run through spamc/spamd. Is there
something that I've
At 4:37 PM -0400 06/14/2013, Alex wrote:
I think the only difference would be if spamd somehow didn't recognize
all the locations for your rules. Perhaps create a rule that you know
will hit with a very low score in each directory that contains rules.
Maybe there's a way to run spamd in the
On Fri, 2013-06-14 at 15:47 -0600, Amir 'CG' Caspi wrote:
The only thing I can _possibly_ think of is that sa-update is run
nightly, but spamd doesn't get rebooted nightly...
Are you sure? Take a look at how sa_update is getting run to make sure
that it is doing what you expect.
sa_update
Alex skrev den 2013-06-14 19:57:
http://pastebin.com/P3mQbwmH
ripmime -i msg -d /tmp
tidy -o html -f error textfile0
gives me this error file content:
line 7 column 1 - Warning: inserting implicit body
line 8 column 1 - Warning: discarding unexpected body
line 12 column 9 - Warning: style
At 11:43 PM +0100 06/14/2013, Martin Gregorie wrote:
Are you sure? Take a look at how sa_update is getting run to make sure
that it is doing what you expect.
Yes, I'm sure. I looked at the update script (in my case, it's
called update_spamassassin, due to the way Parallels Pro configures
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Not all of it is identical in format, but there seems to be one thing
in common: they include lots of random garbage inside either CSS or
in HTML comments. All of this gets ignored by the HTML parser
Hi,
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
I think people will start by telling you to block the pw domain
From: Hoveround m...@xanti.shahphiler.pw
More in this thread:
At 7:25 PM -0400 06/13/2013, Alex wrote:
I think people will start by telling you to block the pw domain
Sure, but not all of the comment-laden spam is from the pw domain.
It comes in from .net, .com, .us, and a bunch of other places as
well. This is just the one example I happened to pick
In an older episode, on 2013-06-14 01:36, Amir 'CG' Caspi wrote:
(I am relatively new to SA's internal workings and don't know how to
make such a rule, however.)
For basics of writing SA rules, maybe look at
http://wiki.apache.org/spamassassin/WritingRules
Hope this helps,
wolfgang
Hi,
On Thu, Jun 13, 2013 at 7:36 PM, Amir 'CG' Caspi ceph...@3phase.com wrote:
At 7:25 PM -0400 06/13/2013, Alex wrote:
I think people will start by telling you to block the pw domain
Sure, but not all of the comment-laden spam is from the pw domain. It comes
in from .net, .com, .us, and a
At 8:04 PM -0400 06/13/2013, Alex wrote:
After looking at it more closely, it's also only hitting bayes20 for
you. Do the others also score so low? This hits bayes99 on my system.
The ones that SA doesn't catch, yes, they are typically low. I have
some that are bayes50, some bayes20, some
Hi,
After looking at it more closely, it's also only hitting bayes20 for
you. Do the others also score so low? This hits bayes99 on my system.
The ones that SA doesn't catch, yes, they are typically low. I have some
that are bayes50, some bayes20, some bayes00. Any that are bayes99 are
On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote:
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Not all of it is identical in format, but there seems to be one thing in
common: they include lots of random garbage inside either CSS or in HTML
comments.
Hi,
On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote:
On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote:
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Not all of it is identical in format, but there seems to be one thing in
Amir 'CG' Caspi skrev den 2013-06-14 01:05:
Lately, I've been getting hit with a LOT of this type of spam:
http://pastebin.com/HD0rNdxU
Not all of it is identical in format, but there seems to be one thing
in common: they include lots of random garbage inside either CSS or
in
HTML comments.
61 matches
Mail list logo