Ha!  I checked my mail before sending this; we're on the same wavelength
yet our emails are out of sync.  You just suggested the same thing I was
leaning on.

On 02/14/2014 10:53 AM, John Hardin wrote:
> S/O is a little surprising:
>
> http://ruleqa.spamassassin.org/?daterev=20140213-r1567864-n&rule=%2FHEXHASH
>
>
> I'm curious as to what hits that in ham...
>
> Perhaps more repetitions would improve that?

I'm actually thinking of replacing the leading \b with a \s to avoid
matching paths and extensions and maybe requiring two preceding words to
avoid a list of file/md5 pairings.  We can experiment with different hit
thresholds as well.

body      __HEXHASHWORD   /(?:\s[a-z]{1,10}){2}\s[0-9a-f]{30}/
tflags    __HEXHASHWORD   multiple maxhits=8
meta      HEXHASH_WORD_5  __HEXHASHWORD >= 5
describe  HEXHASH_WORD_5  5 hexadecimal hashes, each following two words
meta      HEXHASH_WORD_6  __HEXHASHWORD >= 6
describe  HEXHASH_WORD_6  6 hexadecimal hashes, each following two words
meta      HEXHASH_WORD_7  __HEXHASHWORD >= 7
describe  HEXHASH_WORD_7  7 hexadecimal hashes, each following two words
meta      HEXHASH_WORD_8  __HEXHASHWORD >= 8
describe  HEXHASH_WORD_8  8 hexadecimal hashes, each following two words


Users:  Do /not/ implement all of these at once.  This is for Rule QA
testing only.  Once we have results, we can figure out which threshold
is best and then come up with a suggestion or published rule.  (Maybe
tflags nopublish is wise here.)

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to