Re: Wrong functionality of SUBJ_ALL_CAPS in mixed English and Greek subject

mamalos Tue, 19 Jan 2010 05:20:03 -0800

Mike Cardwell-16 wrote:
> 
> On 19/01/2010 10:07, mamalos wrote:
> 
>>> I just pasted that email into spamalyser.com and it gave this:
>>> http://spamalyser.com/v/u32d10ix/mime
>>>
>>> The subject looks fully capitalised to me when decoded? I'm not overly
>>> proficient on my Greek though.
>>>
>>> --
>>> Mike Cardwell    : UK based IT Consultant, LAMP developer, Linux admin
>>> Cardwell IT Ltd. : UK Company - http://cardwellit.com/       #06920226
>>> Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
>>> Spamalyser       : Spam Tool  - http://spamalyser.com/
>>
>>  From the link you sent me (spamalizer), the subject is all in lower case
>> except from the word "TEST" which is written in english.
> 
> Then I don't know the Greek alphabet. The relevant subroutine from 
> SpamAssassin::Plugin::HeaderEval is below:
> 
> ================================================================================
> sub subject_is_all_caps {
>     my ($self, $pms) = @_;
>     my $subject = $pms->get('Subject');
> 
>     $subject =~ s/^\s+//;
>     $subject =~ s/\s+$//;
>     return 0 if $subject !~ /\s/;        # don't match one word subjects
>     return 0 if (length $subject < 10);  # don't match short subjects
>     $subject =~ s/[^a-zA-Z]//g;          # only look at letters
> 
>     # now, check to see if the subject is encoded using a non-ASCII
> charset.
>     # If so, punt on this test to avoid FPs.  We just list the known 
> charsets
>     # this test will FP on, here.
>     my $subjraw = $pms->get('Subject:raw');
>     my $CLTFAC = 
> Mail::SpamAssassin::Constants::CHARSETS_LIKELY_TO_FP_AS_CAPS;
>     if ($subjraw =~ /=\?${CLTFAC}\?/i) {
>       return 0;
>     }
> 
>     return length($subject) && ($subject eq uc($subject));
> }
> ================================================================================
> 
> I guess another exception needs adding?
> 
> -- 
> Mike Cardwell    : UK based IT Consultant, LAMP developer, Linux admin
> Cardwell IT Ltd. : UK Company - http://cardwellit.com/       #06920226
> Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
> Spamalyser       : Spam Tool  - http://spamalyser.com/
> 
> 

My perl and perl-regexes are very rusty, so I am not sure about the code you
are mentioning above. The only thing I see that may trouble me is the line
that reads:


 $subject =~ s/[^a-zA-Z]//g;          # only look at letters

which would only capture Latin characters. After I saw this I sent an email
with a subject entirely written in Greek, where all letters where caps. The
rule was not fired, which means that the function does not check the Greek
part of the string at all, and only checks the Latin part.

Since the last line reads:

return length($subject) && ($subject eq uc($subject));

and $subject does not contain any Greek characters, the outcome returned
will be probably wrong. My problem is that I cannot understand where
non-ASCII characters are read in the above code snippet, and if they are
correctly checked against all characters of the subject.

Thanks again




-- 
View this message in context: 
http://old.nabble.com/Wrong-functionality-of-SUBJ_ALL_CAPS-in-mixed-English-and-Greek-subject-tp27214418p27225660.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Wrong functionality of SUBJ_ALL_CAPS in mixed English and Greek subject

Reply via email to