https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6225
Summary: Invalid numerical HTML entity crashes perl Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Platform: PC OS/Version: FreeBSD Status: NEW Severity: normal Priority: P5 Component: Libraries AssignedTo: dev@spamassassin.apache.org ReportedBy: mark.marti...@ijs.si This was a heavy battle, but I finally managed to track down the reason for perl crashes which started occurring about a week ago. The trigger was a new type of a spam message, with obfuscated text in its HTML section (using foreign (like cyrillic) characters with glyphs resembling ordinary ascii characters). The spamassasin (command line, or spamd, or amavisd) is crashing in Mail::SpamAssassin::PerMsgStatus::_get_parsed_uri_list (called from Plugin/URIDNSBL), trying to match a wicked decoded HTML line to a heavyweight regexp $tbirdurire : while (/$tbirdurire/igo) { Apart from a perl bug (which should not be crashing), the incident also revealed a bug in HTML::Parser (3.62), which produced an illegal character with a huge UTF-8 code by incorrectly parsing one numerical entity in HTML. I reported the HTML::Parser bug to Gisle Aas, and I plan to report the perl bug to the perl bug tracker. Apart from waiting for the new versions of perl and new HTML::Parser, I wonder what can be done on SpamAssassin to work around the trouble. Reinventing HTML decoding by ourselves is not appealing, and checking decoded utf-8 string validity is likely to be prohibitively slow. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.