On Wed, 12 Nov 2014, Joe Quinn wrote:

On 11/9/2014 11:07 AM, David B Funk wrote:
On Sun, 9 Nov 2014, David B Funk wrote:

For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is
If that pattern were terminated like:
 /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is
it should prevent the FPs (hopefully with out destroying its effectiveness)

Oops, for that new formulation it would actually need to be:
  /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is


This rule is currently scored very low:
score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242

Can you give an example of a real legitimate domain that is being falsely marked by this? Otherwise I wouldn't be too worried because this rule would have to be in conjunction with a good bit more in order to get flagged as an FP. Your proposed change is very minimal in terms of the strings I would expect it to hit on.

In my first message I included an example URL from an Amtrak ticket notification
that it fired on.

My proposed change makes the different between it FPing on a URL that has a
host part that starts with a long number and it only firing on URLs with
a decimal encoded IP address (which I assume was the intention of the rule's
original author).

It's mis-firing enough that its S/O ratio is down to 0.3244 based upon my mail
flows (IE it's firing on twice as much ham as spam).
That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and
MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP.

I can put the message up on pastbin but I'll need to anonymize it first
as it does have personal ticket info in it.


--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to