http://bugzilla.spamassassin.org/show_bug.cgi?id=2954
Summary: check_for_to_in_subject() EVAL modifications
Product: Spamassassin
Version: 2.63
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P1
Component: Rules (Eval Tests)
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]
check_for_to_in_subject() will only match subjects like
dallase,how are you?
dallase,blah blah blah
this modified return regex will detect variations..
return (($subject =~ /^\s*\Q$to\E[\.\,\-]+\s*\S/i) ||
($subject =~ /\S\s*[\.\,\-]+\Q$to\E(?:[\!\?\.]+)$/i));
dallase,how are you?
dallase, how are you?
DALLASE, how are you?
dallase... how are you?
dallase - how are you
how are you, DALLASE
how are you, dallase?
how are you, dallase!!!
how are you, dallase.
plus be case-insensitive (of which caused the biggest jump in detection). my
results show 4-fold improvement in userpart detection in subject.
here are the results on my corpus... first test being the original rule, the
second test being my modified regex case-sensitive, third test being modified
regex case-insensitive.
# Tue Jan 20 13:44:12 CST 2004
# beginning test of testrule.USERPART.txt:
# original eval
header USERNAME_IN_SUBJECT eval:check_for_to_in_subject()
describe USERNAME_IN_SUBJECT To: username at front of subject
score USERNAME_IN_SUBJECT 2.900 2.800 2.800 2.700
# new eval 1
header USERPART_IN_SUBJECT_1 eval:check_user_part_in_subject()
describe USERPART_IN_SUBJECT_1 subject contains case-sensitive username at
beginning or end
score USERPART_IN_SUBJECT_1 2.900 2.800 2.800 2.700
# new eval 2
header USERPART_IN_SUBJECT_2 eval:check_user_part_in_subject_nocase()
describe USERPART_IN_SUBJECT_2 subject contains username at beginning or end
score USERPART_IN_SUBJECT_2 2.900 2.800 2.800 2.700
############################################################
# USERNAME_IN_SUBJECT -- 22s/0h of 10963 corpus (6083s/4880h), 2004-01-20
############################################################
############################################################
# USERPART_IN_SUBJECT_1 -- 38s/0h of 10963 corpus (6083s/4880h), 2004-01-20
############################################################
############################################################
# USERPART_IN_SUBJECT_2 -- 90s/0h of 10963 corpus (6083s/4880h), 2004-01-20
############################################################
OVERALL SPAM HAM S/O SCORE NAME
10963 6083 4880 0.555 0.00 0.00 (all messages)
90 90 0 1.000 1.00 2.90 USERPART_IN_SUBJECT_2
38 38 0 1.000 0.24 2.90 USERPART_IN_SUBJECT_1
22 22 0 1.000 0.00 2.90 USERNAME_IN_SUBJECT
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
10963 6083 4880 0.555 0.00 0.00 (all messages)
100.000 55.4866 44.5134 0.555 0.00 0.00 (all messages as %)
0.821 1.4795 0.0000 1.000 1.00 2.90 USERPART_IN_SUBJECT_2
0.347 0.6247 0.0000 1.000 0.24 2.90 USERPART_IN_SUBJECT_1
0.201 0.3617 0.0000 1.000 0.00 2.90 USERNAME_IN_SUBJECT
dallas
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.