--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged to have said:

    Worse, enabling charset normalization completely breaks UTF-8 chars
    in the regex. At least in my ad-hoc --cf command line testing.

--As for the rest, it is mine.

This sounds like something where `use feature 'unicode_strings'` might have an affect - enabling normalization is probably setting the internal utf8 flag on incoming text, which could change the semantics of the regex matching.

If that's the case, it raises the question of if we want Spamassassin to require Perl 5.12 (which includes that feature) - the current base version is 5.8.1. Unicode support has been evolving in Perl; 5.8 supports it generally, but there were bugs. I think 5.12 got most of them, but I'm not sure. (And of course it's not the current version of Perl.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Reply via email to