On 12/5/2014 1:19 PM, Gibbs, David wrote:
On 12/5/2014 11:25 AM, John Hardin wrote:
FWIW: here's the rule I came up with ... seems to work adequately.
header __COUNT_SUBJ Subject =~ /.*/
You might want to be a little bit more paranoid and explicitly anchor
that:
header __COUNT_SUBJ Subject =~ /^.*$/
I know .* is greedy and shouldn't overlap on multiple matches, but
this helps make sure.
I tried that originally, but it didn't end up matching.
Oddly, when I put the original rule "/.*/" in place, and ran a message
with multiple subject lines through in debug ... I got the following
relevant output:
Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: " The Hottest Smartphones - Details Inside "
Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: "negative match"
Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: "negative match"
Dec 5 12:09:52.032 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: "negative match"
Dec 5 12:09:52.033 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: "The Hottest Smartphones - Details Inside"
Dec 5 12:09:52.033 [2459] dbg: rules: ran header rule __COUNT_SUBJ
======> got hit: "negative match"
I'm assuming "negative match" means that the rule didn't match.
The message in question has 4 subject lines, the first appears to be
encoded, 2 more that are blank, the 4th one is plain text.
Example: http://code.midrange.com/4c731ced97.html
Not sure why the rule is being applied 6 times.
david
It's likely going to have to do with (.*) accepting a zero-width match.
The regex engine effectively considers every string to have a zero-width
/thing/ between every character. The first match consumes the whole
string, leaving the cursor at the end. The next match is that zero-width
magic at the end of the text. To see this in action, compare these two
perl lines:
perl -e '$x = "abc"; while ($x =~ //cg) {print "match\n";}' # matches
the empty spaces before 'a', and after 'a', 'b', and 'c' - 4 matches total
perl -e '$x = "abc"; while ($x =~ /./cg) {print "match\n";}' # matches
the characters 'a', 'b', and 'c' as you would expect