On 12/5/2014 1:19 PM, Gibbs, David wrote:
On 12/5/2014 11:25 AM, John Hardin wrote:
FWIW: here's the rule I came up with ... seems to work adequately.

header __COUNT_SUBJ Subject =~ /.*/

You might want to be a little bit more paranoid and explicitly anchor that:

   header __COUNT_SUBJ Subject =~ /^.*$/

I know .* is greedy and shouldn't overlap on multiple matches, but this helps make sure.

I tried that originally, but it didn't end up matching.

Oddly, when I put the original rule "/.*/" in place, and ran a message with multiple subject lines through in debug ... I got the following relevant output:

Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: " The Hottest Smartphones - Details Inside " Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: "negative match" Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: "negative match" Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: "negative match" Dec 5 12:09:52.033 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: "The Hottest Smartphones - Details Inside" Dec 5 12:09:52.033 [2459] dbg: rules: ran header rule __COUNT_SUBJ ======> got hit: "negative match"

I'm assuming "negative match" means that the rule didn't match.

The message in question has 4 subject lines, the first appears to be encoded, 2 more that are blank, the 4th one is plain text.

Example: http://code.midrange.com/4c731ced97.html

Not sure why the rule is being applied 6 times.

david

It's likely going to have to do with (.*) accepting a zero-width match. The regex engine effectively considers every string to have a zero-width /thing/ between every character. The first match consumes the whole string, leaving the cursor at the end. The next match is that zero-width magic at the end of the text. To see this in action, compare these two perl lines:

perl -e '$x = "abc"; while ($x =~ //cg) {print "match\n";}' # matches the empty spaces before 'a', and after 'a', 'b', and 'c' - 4 matches total perl -e '$x = "abc"; while ($x =~ /./cg) {print "match\n";}' # matches the characters 'a', 'b', and 'c' as you would expect

Reply via email to