--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged to
have said:
Worse, enabling charset normalization completely breaks UTF-8 chars
in the regex. At least in my ad-hoc --cf command line testing.
--As for the rest, it is mine.
This sounds like something where `use feature 'unicode_strings'` might have
an affect - enabling normalization is probably setting the internal utf8
flag on incoming text, which could change the semantics of the regex
matching.
If that's the case, it raises the question of if we want Spamassassin to
require Perl 5.12 (which includes that feature) - the current base version
is 5.8.1. Unicode support has been evolving in Perl; 5.8 supports it
generally, but there were bugs. I think 5.12 got most of them, but I'm not
sure. (And of course it's not the current version of Perl.)
Daniel T. Staal
---------------------------------------------------------------
This email copyright the author. Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes. This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------