Re: New rule for HTML spam, using comments?

2013-06-20 Thread Benny Pedersen
ceph...@3phase.com skrev den 2013-06-19 22:11: Hi John, See the following example: http://pastebin.com/DAYJ7NnJ Lots of style gibberish for sure, but it failed to hit your rule (sa-update ran at 4am today so it should have picked up anything published). I'm guessing it's the parentheses.

Re: New rule for HTML spam, using comments?

2013-06-20 Thread Tom Hendrikx
On 06/20/2013 01:34 AM, Amir 'CG' Caspi wrote: On Wed, June 19, 2013 3:47 pm, Axb wrote: SA's URIBL plugin doesn't and shouldn't look in the alt attribute. Why not, exactly? I wouldn't look at it for _all_ img tags, only for ones that are clearly MailScanner-munged. That is, one would look

Re: New rule for HTML spam, using comments?

2013-06-20 Thread Amir 'CG' Caspi
At 9:47 AM +0200 06/20/2013, Tom Hendrikx wrote: Since mailscanner already has support for integrating spamassassin [1] (As I mentioned explicitly in a previous email...) why would you ever want to put work in reversing some of mailscanners 'protection'? Because, given the particularls of

Re: New rule for HTML spam, using comments?

2013-06-20 Thread Benny Pedersen
Amir 'CG' Caspi skrev den 2013-06-20 11:13: BTW, I'm not talking about _actually_ reversing MailScanner's protection. I'm talking about SA understanding enough to unmunge the URI **for SA processing only**. The actual mail delivered to the end-user would remain munged. SA would not be

Re: New rule for HTML spam, using comments?

2013-06-19 Thread cepheid
Hi John, See the following example: http://pastebin.com/DAYJ7NnJ Lots of style gibberish for sure, but it failed to hit your rule (sa-update ran at 4am today so it should have picked up anything published). I'm guessing it's the parentheses. Whack the mole! =)

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Axb
On 06/19/2013 10:11 PM, ceph...@3phase.com wrote: Hi John, See the following example: http://pastebin.com/DAYJ7NnJ Lots of style gibberish for sure, but it failed to hit your rule (sa-update ran at 4am today so it should have picked up anything published). I'm guessing it's the parentheses.

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Amir Caspi
Another, nearly identical example I saw today , but which used trailing slashes (/ or //) instead of parentheses. http://pastebin.com/6XRwcjm3 Enjoy. =) --- Amir On Wed, June 19, 2013 2:11 pm, ceph...@3phase.com wrote: Hi John, See the

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Amir Caspi
On Wed, June 19, 2013 2:33 pm, Axb wrote: imo, it makes little sense to write rules to catch these hashbusters. As If the rule is sufficiently broad, it will catch them. If the rule is so strict that it catches only one trailing slash or something, then yes, it makes little sense... but I think

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Axb
On 06/19/2013 10:54 PM, Amir Caspi wrote: Perhaps SA should include a module/plugin to unmunge MailScanner munging? Has anyone written one, or if not, would anyone like to? ;-) (Since MailScanner is open-source perl, I imagine it should be relatively straightforward to find the munging code,

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Amir 'CG' Caspi
On Wed, June 19, 2013 3:14 pm, Axb wrote: iirc, MailScanner munges the URL befor SA sees it so unless your plugin idea involves a crystal ball, it's not possible. Yes, MailScanner gets to it before SA does, unless SA is called from within MailScanner (which it isn't, on my setup, but that is a

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Axb
On 06/19/2013 11:30 PM, Amir 'CG' Caspi wrote: Yes, MailScanner gets to it before SA does, unless SA is called from within MailScanner (which it isn't, on my setup, but that is a possible setup). However, the complete original URL is still contained within the munged one. It's in the alt

Re: New rule for HTML spam, using comments?

2013-06-19 Thread Amir 'CG' Caspi
On Wed, June 19, 2013 3:47 pm, Axb wrote: SA's URIBL plugin doesn't and shouldn't look in the alt attribute. Why not, exactly? I wouldn't look at it for _all_ img tags, only for ones that are clearly MailScanner-munged. That is, one would look for the patterns that MailScanner uses for

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir 'CG' Caspi
At 4:37 PM -0400 06/14/2013, Alex wrote: On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote: I wonder if there's some difference between running spamassassin manually on the message versus running spamd. I think the only difference would be if spamd somehow didn't

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Ben Johnson
On 6/18/2013 5:31 AM, Amir 'CG' Caspi wrote: At 4:37 PM -0400 06/14/2013, Alex wrote: On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote: I wonder if there's some difference between running spamassassin manually on the message versus running spamd. I think

Re: New rule for HTML spam, using comments?

2013-06-18 Thread John Hardin
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: At 10:48 AM -0700 06/17/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day or so, since the new rules hit the distribution. So far, all TPs, no FPs.

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir 'CG' Caspi
At 10:13 AM -0700 06/18/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: Any idea why it failed to hit, and does this need another rule revision? Yep, and yep. Revision committed. Initial comment gibberish rule committed. Thanks for the revision. Do you want to explain

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir 'CG' Caspi
At 8:58 AM -0400 06/18/2013, Ben Johnson wrote: a.) You are copying/pasting the body of the email, but not the headers. No, I am copying the headers... however, I am using Eudora (ancient, I know) as a mail client, and it's possible the headers are not properly formatted. For example, for

Re: New rule for HTML spam, using comments?

2013-06-18 Thread John Hardin
On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote: At 10:13 AM -0700 06/18/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: Any idea why it failed to hit, and does this need another rule revision? Yep, and yep. Revision committed. Initial comment gibberish rule committed.

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir 'CG' Caspi
At 10:24 AM -0700 06/18/2013, John Hardin wrote: The earlier version wasn't allowing for some punctuation in the gibberish. There may be a period of whack-a-mole here, I was conservative in the change I made. Makes sense. Both of those examples are good for creating an

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Axb
On 06/18/2013 07:24 PM, John Hardin wrote: On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote: At 10:13 AM -0700 06/18/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: Any idea why it failed to hit, and does this need another rule revision? Yep, and yep. Revision committed.

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Kris Deugau
Amir 'CG' Caspi wrote: At 8:58 AM -0400 06/18/2013, Ben Johnson wrote: a.) You are copying/pasting the body of the email, but not the headers. No, I am copying the headers... however, I am using Eudora (ancient, I know) as a mail client, and it's possible the headers are not properly

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Axb
On 06/18/2013 07:18 PM, Amir 'CG' Caspi wrote: Either way, I am _trying_ to copy the entire message. Not sure what is misformatted there. If you take a look at my two pasted examples (links below for convenience), those are direct copy/paste from Eudora's raw source view. Any idea what is

Re: New rule for HTML spam, using comments?

2013-06-18 Thread John Hardin
On Tue, 18 Jun 2013, Axb wrote: On 06/18/2013 07:24 PM, John Hardin wrote: On Tue, 18 Jun 2013, Amir 'CG' Caspi wrote: At 10:13 AM -0700 06/18/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: Any idea why it failed to hit, and does this need another rule

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Ben Johnson
On 6/18/2013 1:18 PM, Amir 'CG' Caspi wrote: At 8:58 AM -0400 06/18/2013, Ben Johnson wrote: a.) You are copying/pasting the body of the email, but not the headers. No, I am copying the headers... however, I am using Eudora (ancient, I know) as a mail client, and it's possible the headers

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir 'CG' Caspi
Replies to multiple folks below... At 1:42 PM -0400 06/18/2013, Kris Deugau wrote: Try opening the on-disk file with Notepad (or your favourite text editor on *nix). If you see the same thing you see when you hit the blah blah blah button in Eudora, you should be OK. If not... I've done

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Martin Gregorie
On Tue, 2013-06-18 at 11:18 -0600, Amir 'CG' Caspi wrote: At 8:58 AM -0400 06/18/2013, Ben Johnson wrote: a.) You are copying/pasting the body of the email, but not the headers. No, I am copying the headers... however, I am using Eudora (ancient, I know) as a mail client, and it's possible

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir Caspi
On Tue, June 18, 2013 1:01 pm, Martin Gregorie wrote: The main thing I notice is that there are only two Received: headers, and no envelope-From so IMO you're hoping for too much from the header-related SA rules simply because there's very little for SA to get its teeth into. Well, I'm not

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Martin Gregorie
On Tue, 2013-06-18 at 20:01 +0100, Martin Gregorie wrote: BTW, I just ran through 848 messages on this fairly average host (Lenovo R61i [Intel Core Duo at 1.6GHz, 3GB RAM) running Fedora 18. The first run averaged 1095 mS/message and the second averaged 96 mS/message, so I don't think John's

RE: New rule for HTML spam, using comments?

2013-06-18 Thread emailitis.com
Now I just have to figure out my Bayes problem... Amir, When you do work that out, please let us know. We get LOTS of Spam getting through and John said that it is the BAYES_00 which is causing the problem. Restarting training seems a bit extreme. We cannot monitor every hosted user,

Re: New rule for HTML spam, using comments?

2013-06-18 Thread RW
On Tue, 18 Jun 2013 13:13:56 -0600 (MDT) Amir Caspi wrote: Well, I'm not really concerned about getting any header-related SA rules to hit, for these tests. As I mentioned previously, my primary concern right now is the disconnect between the Bayes score during the automatic MTA delivery and

Re: New rule for HTML spam, using comments?

2013-06-18 Thread Amir Caspi
On Tue, June 18, 2013 4:36 pm, RW wrote: One thing to watch out for is that a mailbox may contain hidden deleted mail that remains there until the mail client compacts/expunges the mailbox. For that reason I prefer explicit training folders rather than folders where misclassified mails have

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Amir 'CG' Caspi
At 7:20 PM -0700 06/15/2013, John Hardin wrote: I took a closer look at this and it seems they're working around trivial gibberish detection by putting a valid CSS property at the very beginning of the style tag. Revising the rules... I am now seeing STYLE_GIBBERISH hitting on a lot of spam

Re: New rule for HTML spam, using comments?

2013-06-17 Thread John Hardin
On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: At 7:20 PM -0700 06/15/2013, John Hardin wrote: I took a closer look at this and it seems they're working around trivial gibberish detection by putting a valid CSS property at the very beginning of the style tag. Revising the rules... I am now

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Amir Caspi
On Mon, June 17, 2013 11:48 am, John Hardin wrote: Well, that's a much harder problem. STYLE tags have a specified format, and content not matching that format is (fairly) easy to detect. Comments are freeform text - gibberish has the same meaning there that it does in regular body text.

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Alex
Hi, On Mon, Jun 17, 2013 at 1:48 PM, John Hardin jhar...@impsec.org wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: At 7:20 PM -0700 06/15/2013, John Hardin wrote: I took a closer look at this and it seems they're working around trivial gibberish detection by putting a valid CSS property

Re: New rule for HTML spam, using comments?

2013-06-17 Thread John Hardin
On Mon, 17 Jun 2013, Alex wrote: Hi, On Mon, Jun 17, 2013 at 1:48 PM, John Hardin jhar...@impsec.org wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: At 7:20 PM -0700 06/15/2013, John Hardin wrote: I took a closer look at this and it seems they're working around trivial gibberish

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Alex
Hi, I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day or so, since the new rules hit the distribution. So far, all TPs, no FPs. Yay! I've also noticed the latest iteration hitting now quite a bit, but also found an FP from groupon: http://pastebin.com/qwdtSqJd

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Benny Pedersen
John Hardin skrev den 2013-06-17 20:52: http://pastebin.com/qwdtSqJd Well, that *is* gibberish in a STYLE tag. Bad coder, no biscuit. If it persists I can add an exclusion for mail from groupon.com Content analysis details: (-2.4 points, 5.0 required) pts rule name

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Alex
Hi, On Mon, Jun 17, 2013 at 10:39 PM, Benny Pedersen m...@junc.eu wrote: John Hardin skrev den 2013-06-17 20:52: http://pastebin.com/qwdtSqJd Well, that *is* gibberish in a STYLE tag. Bad coder, no biscuit. If it persists I can add an exclusion for mail from groupon.com Content analysis

Re: New rule for HTML spam, using comments?

2013-06-17 Thread Amir 'CG' Caspi
At 10:48 AM -0700 06/17/2013, John Hardin wrote: On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote: I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day or so, since the new rules hit the distribution. So far, all TPs, no FPs. Yay! But, I found one today that should have hit

Re: New rule for HTML spam, using comments?

2013-06-15 Thread John Hardin
On Fri, 14 Jun 2013, Alex wrote: http://ruleqa.spamassassin.org/20130613-r1492572-n/STYLE_GIBBERISH/detail John, I've just tried with your latest, and his sample doesn't hit STYLE_GIBBERISH. Any suggestions? Hmm. I created an HTML message with a series of words in the style tag and it did

Re: New rule for HTML spam, using comments?

2013-06-14 Thread John Hardin
On Thu, 13 Jun 2013, Alex wrote: Hi, On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote: On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote: Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Alex
Hi, On Fri, Jun 14, 2013 at 9:51 AM, John Hardin jhar...@impsec.org wrote: On Thu, 13 Jun 2013, Alex wrote: Hi, On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote: On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote: Lately, I've been getting hit with a LOT of this type of

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Amir 'CG' Caspi
At 9:43 PM -0400 06/13/2013, Alex wrote: I'd say if you have any that are hitting bayes20 or lower, your database is not working properly and you should probably start over. Not quite sure I want to do that... I don't really have a sufficient corpus of mail for good training. It's working

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Alex
Hi, On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi ceph...@3phase.com wrote: At 9:43 PM -0400 06/13/2013, Alex wrote: I'd say if you have any that are hitting bayes20 or lower, your database is not working properly and you should probably start over. Not quite sure I want to do that... I

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Amir 'CG' Caspi
At 4:37 PM -0400 06/14/2013, Alex wrote: Yeah, but not bayes20. That's bad for sure. You should start collecting now, or pull a few hundred from your recent quarantine and use those, along with people's mail folders. Well, I got bayes99 when I ran spamassassin manually just now. So, I really

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Martin Gregorie
On Fri, 2013-06-14 at 16:37 -0400, Alex wrote: The rules definitely exist on my system. I wonder if there's some difference between running spamassassin manually on the message versus running spamd. The message I pasted was run through spamc/spamd. Is there something that I've

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Amir 'CG' Caspi
At 4:37 PM -0400 06/14/2013, Alex wrote: I think the only difference would be if spamd somehow didn't recognize all the locations for your rules. Perhaps create a rule that you know will hit with a very low score in each directory that contains rules. Maybe there's a way to run spamd in the

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Martin Gregorie
On Fri, 2013-06-14 at 15:47 -0600, Amir 'CG' Caspi wrote: The only thing I can _possibly_ think of is that sa-update is run nightly, but spamd doesn't get rebooted nightly... Are you sure? Take a look at how sa_update is getting run to make sure that it is doing what you expect. sa_update

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Benny Pedersen
Alex skrev den 2013-06-14 19:57: http://pastebin.com/P3mQbwmH ripmime -i msg -d /tmp tidy -o html -f error textfile0 gives me this error file content: line 7 column 1 - Warning: inserting implicit body line 8 column 1 - Warning: discarding unexpected body line 12 column 9 - Warning: style

Re: New rule for HTML spam, using comments?

2013-06-14 Thread Amir 'CG' Caspi
At 11:43 PM +0100 06/14/2013, Martin Gregorie wrote: Are you sure? Take a look at how sa_update is getting run to make sure that it is doing what you expect. Yes, I'm sure. I looked at the update script (in my case, it's called update_spamassassin, due to the way Parallels Pro configures

New rule for HTML spam, using comments?

2013-06-13 Thread Amir 'CG' Caspi
Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU Not all of it is identical in format, but there seems to be one thing in common: they include lots of random garbage inside either CSS or in HTML comments. All of this gets ignored by the HTML parser

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Alex
Hi, Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU I think people will start by telling you to block the pw domain From: Hoveround m...@xanti.shahphiler.pw More in this thread:

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Amir 'CG' Caspi
At 7:25 PM -0400 06/13/2013, Alex wrote: I think people will start by telling you to block the pw domain Sure, but not all of the comment-laden spam is from the pw domain. It comes in from .net, .com, .us, and a bunch of other places as well. This is just the one example I happened to pick

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Wolfgang Zeikat
In an older episode, on 2013-06-14 01:36, Amir 'CG' Caspi wrote: (I am relatively new to SA's internal workings and don't know how to make such a rule, however.) For basics of writing SA rules, maybe look at http://wiki.apache.org/spamassassin/WritingRules Hope this helps, wolfgang

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Alex
Hi, On Thu, Jun 13, 2013 at 7:36 PM, Amir 'CG' Caspi ceph...@3phase.com wrote: At 7:25 PM -0400 06/13/2013, Alex wrote: I think people will start by telling you to block the pw domain Sure, but not all of the comment-laden spam is from the pw domain. It comes in from .net, .com, .us, and a

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Amir 'CG' Caspi
At 8:04 PM -0400 06/13/2013, Alex wrote: After looking at it more closely, it's also only hitting bayes20 for you. Do the others also score so low? This hits bayes99 on my system. The ones that SA doesn't catch, yes, they are typically low. I have some that are bayes50, some bayes20, some

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Alex
Hi, After looking at it more closely, it's also only hitting bayes20 for you. Do the others also score so low? This hits bayes99 on my system. The ones that SA doesn't catch, yes, they are typically low. I have some that are bayes50, some bayes20, some bayes00. Any that are bayes99 are

Re: New rule for HTML spam, using comments?

2013-06-13 Thread John Hardin
On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote: Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU Not all of it is identical in format, but there seems to be one thing in common: they include lots of random garbage inside either CSS or in HTML comments.

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Alex
Hi, On Thu, Jun 13, 2013 at 9:55 PM, John Hardin jhar...@impsec.org wrote: On Thu, 13 Jun 2013, Amir 'CG' Caspi wrote: Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU Not all of it is identical in format, but there seems to be one thing in

Re: New rule for HTML spam, using comments?

2013-06-13 Thread Benny Pedersen
Amir 'CG' Caspi skrev den 2013-06-14 01:05: Lately, I've been getting hit with a LOT of this type of spam: http://pastebin.com/HD0rNdxU Not all of it is identical in format, but there seems to be one thing in common: they include lots of random garbage inside either CSS or in HTML comments.