On Fri, 23 Jan 2009, Dennis Hardy wrote:
Here is what I have been using (from previous help from this mail list!):
uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/
uri SSS_URI30 1.5
this uri rule does work very well. but they change the length
sometimes, so I have a few rules that handle different lengths. Maybe I
should use 29,31 instead of just 30 for example?
Am I being too conservative? Should I consider bumping the score of
this up more? And my meta up more perhaps?
Again, I'd have to see more examples to comment meaningfully. I would be
especially interested in whether or not the part after the domain name is
indeed free from punctuation.
A long string of unpunctuated letters is less likely to FP than a long
string of letters, numbers and underscores.
You might want to anchor your rule with a $ as it may FP if there is stuff
in the URI following the string of gibberish. Try it against this very
legitimate looking (if overly verbose) URI:
http://fnord.com/retrieve_document_as_pdf3_file.php?123456
And the rule I suggested makes an attempt to detect gibberish by looking
for a "q" that is not followed by a "u", which is rare in English words.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Vista: because the audio experience is *far* more important than
network throughput.
-----------------------------------------------------------------------
4 days until Wolfgang Amadeus Mozart's 253rd Birthday