Ha! I checked my mail before sending this; we're on the same wavelength yet our emails are out of sync. You just suggested the same thing I was leaning on.
On 02/14/2014 10:53 AM, John Hardin wrote: > S/O is a little surprising: > > http://ruleqa.spamassassin.org/?daterev=20140213-r1567864-n&rule=%2FHEXHASH > > > I'm curious as to what hits that in ham... > > Perhaps more repetitions would improve that? I'm actually thinking of replacing the leading \b with a \s to avoid matching paths and extensions and maybe requiring two preceding words to avoid a list of file/md5 pairings. We can experiment with different hit thresholds as well. body __HEXHASHWORD /(?:\s[a-z]{1,10}){2}\s[0-9a-f]{30}/ tflags __HEXHASHWORD multiple maxhits=8 meta HEXHASH_WORD_5 __HEXHASHWORD >= 5 describe HEXHASH_WORD_5 5 hexadecimal hashes, each following two words meta HEXHASH_WORD_6 __HEXHASHWORD >= 6 describe HEXHASH_WORD_6 6 hexadecimal hashes, each following two words meta HEXHASH_WORD_7 __HEXHASHWORD >= 7 describe HEXHASH_WORD_7 7 hexadecimal hashes, each following two words meta HEXHASH_WORD_8 __HEXHASHWORD >= 8 describe HEXHASH_WORD_8 8 hexadecimal hashes, each following two words Users: Do /not/ implement all of these at once. This is for Rule QA testing only. Once we have results, we can figure out which threshold is best and then come up with a suggestion or published rule. (Maybe tflags nopublish is wise here.)
signature.asc
Description: OpenPGP digital signature