http://bugzilla.spamassassin.org/show_bug.cgi?id=2954

           Summary: check_for_to_in_subject() EVAL modifications
           Product: Spamassassin
           Version: 2.63
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: Rules (Eval Tests)
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


check_for_to_in_subject() will only match subjects like

dallase,how are you?
dallase,blah blah blah

this modified return regex will detect variations.. 

return (($subject =~ /^\s*\Q$to\E[\.\,\-]+\s*\S/i) ||
       ($subject =~ /\S\s*[\.\,\-]+\Q$to\E(?:[\!\?\.]+)$/i));

dallase,how are you?
dallase, how are you?
DALLASE, how are you?
dallase... how are you?
dallase - how are you
how are you, DALLASE
how are you, dallase?
how are you, dallase!!!
how are you, dallase.

plus be case-insensitive (of which caused the biggest jump in detection).  my 
results show 4-fold improvement in userpart detection in subject.

here are the results on my corpus... first test being the original rule, the 
second test being my modified regex case-sensitive, third test being modified 
regex case-insensitive.

# Tue Jan 20 13:44:12 CST 2004
# beginning test of testrule.USERPART.txt:
# original eval
header   USERNAME_IN_SUBJECT    eval:check_for_to_in_subject()
describe USERNAME_IN_SUBJECT    To: username at front of subject
score    USERNAME_IN_SUBJECT    2.900 2.800 2.800 2.700

# new eval 1
header   USERPART_IN_SUBJECT_1  eval:check_user_part_in_subject()
describe USERPART_IN_SUBJECT_1  subject contains case-sensitive username at 
beginning or end
score    USERPART_IN_SUBJECT_1  2.900 2.800 2.800 2.700

# new eval 2
header   USERPART_IN_SUBJECT_2  eval:check_user_part_in_subject_nocase()
describe USERPART_IN_SUBJECT_2  subject contains username at beginning or end
score    USERPART_IN_SUBJECT_2  2.900 2.800 2.800 2.700

############################################################
# USERNAME_IN_SUBJECT -- 22s/0h of 10963 corpus (6083s/4880h), 2004-01-20 
############################################################

############################################################
# USERPART_IN_SUBJECT_1 -- 38s/0h of 10963 corpus (6083s/4880h), 2004-01-20 
############################################################

############################################################
# USERPART_IN_SUBJECT_2 -- 90s/0h of 10963 corpus (6083s/4880h), 2004-01-20 
############################################################

OVERALL     SPAM      HAM     S/O   SCORE  NAME
  10963     6083     4880    0.555   0.00    0.00  (all messages)
     90       90        0    1.000   1.00   2.90  USERPART_IN_SUBJECT_2
     38       38        0    1.000   0.24   2.90  USERPART_IN_SUBJECT_1
     22       22        0    1.000   0.00   2.90  USERNAME_IN_SUBJECT

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  10963     6083     4880    0.555   0.00    0.00  (all messages)
100.000  55.4866  44.5134    0.555   0.00    0.00  (all messages as %)
  0.821   1.4795   0.0000    1.000   1.00    2.90  USERPART_IN_SUBJECT_2
  0.347   0.6247   0.0000    1.000   0.24    2.90  USERPART_IN_SUBJECT_1
  0.201   0.3617   0.0000    1.000   0.00    2.90  USERNAME_IN_SUBJECT

dallas



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to