On Wed, 12 Nov 2014, Joe Quinn wrote:
On 11/9/2014 11:07 AM, David B Funk wrote:
On Sun, 9 Nov 2014, David B Funk wrote:
For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is
If that pattern were terminated like:
/^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is
it should prevent the FPs (hopefully with out destroying its
effectiveness)
Oops, for that new formulation it would actually need to be:
/^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is
This rule is currently scored very low:
score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242
Can you give an example of a real legitimate domain that is being falsely
marked by this? Otherwise I wouldn't be too worried because this rule would
have to be in conjunction with a good bit more in order to get flagged as an
FP. Your proposed change is very minimal in terms of the strings I would
expect it to hit on.
In my first message I included an example URL from an Amtrak ticket notification
that it fired on.
My proposed change makes the different between it FPing on a URL that has a
host part that starts with a long number and it only firing on URLs with
a decimal encoded IP address (which I assume was the intention of the rule's
original author).
It's mis-firing enough that its S/O ratio is down to 0.3244 based upon my mail
flows (IE it's firing on twice as much ham as spam).
That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and
MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP.
I can put the message up on pastbin but I'll need to anonymize it first
as it does have personal ticket info in it.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{