https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6225

           Summary: Invalid numerical HTML entity crashes perl
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: PC
        OS/Version: FreeBSD
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: mark.marti...@ijs.si


This was a heavy battle, but I finally managed to track down the reason
for perl crashes which started occurring about a week ago. The trigger was
a new type of a spam message, with obfuscated text in its HTML section
(using foreign (like cyrillic) characters with glyphs resembling ordinary
ascii characters).

The spamassasin (command line, or spamd, or amavisd) is crashing in
Mail::SpamAssassin::PerMsgStatus::_get_parsed_uri_list (called from
Plugin/URIDNSBL), trying to match a wicked decoded HTML line to a
heavyweight regexp $tbirdurire :

      while (/$tbirdurire/igo) {

Apart from a perl bug (which should not be crashing), the incident also
revealed a bug in HTML::Parser (3.62), which produced an illegal character
with a huge UTF-8 code by incorrectly parsing one numerical entity in HTML.

I reported the HTML::Parser bug to Gisle Aas, and I plan to report the
perl bug to the perl bug tracker.

Apart from waiting for the new versions of perl and new HTML::Parser,
I wonder what can be done on SpamAssassin to work around the trouble.
Reinventing HTML decoding by ourselves is not appealing, and checking
decoded utf-8 string validity is likely to be prohibitively slow.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to