Re: Operations on headers in UTF-8

Daniel Staal Tue, 10 Jun 2014 18:24:06 -0700

--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged tohave said:

    Worse, enabling charset normalization completely breaks UTF-8 chars
    in the regex. At least in my ad-hoc --cf command line testing.


--As for the rest, it is mine.

This sounds like something where `use feature 'unicode_strings'` might havean affect - enabling normalization is probably setting the internal utf8flag on incoming text, which could change the semantics of the regexmatching.

If that's the case, it raises the question of if we want Spamassassin torequire Perl 5.12 (which includes that feature) - the current base versionis 5.8.1. Unicode support has been evolving in Perl; 5.8 supports itgenerally, but there were bugs. I think 5.12 got most of them, but I'm notsure. (And of course it's not the current version of Perl.)


Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Re: Operations on headers in UTF-8

Reply via email to