Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

SADAHIRO Tomoyuki Sun, 24 Aug 2003 14:15:15 +0000

On Wed, 07 May 2003 08:42:25 +0100
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:


> Bjoern Jacke <[EMAIL PROTECTED]> writes:
> >well, see: from_to claims to convert from encoding1 to encoding2. 
> >encoding1 in this case is utf-8. Also the non-composed UTF-8 is 
> >perfectly valid UTF-8 and there's absolutely no reason, why 
> >from_to($string,"utf8","latin1") should not work just because I used 
> >the NFD form and not the NFC form. Your example is just a way to work 
> >around this bug but from_to should not care if the initial string is 
> >NFC or NFD.
> 
> Most of perl's encodings are octet-sequence/octet-sequence converters.
> Which are easy to code, compact reasonably fast and ... dumb!
> I also probably gave more thought to decode (from some form to Unicode)
> rather than encode step - for decode producing NFC is natural.
> 
> Perhaps it makes sense to add a tweak to encode side so that if no encoding 
> exists for the code point and code-point sequence is not normalize it tries
> to normalize?

For transcoding/normalization at once, I write a tiny module,
which is somewhat broken, though:

(1) Module name?
(2) Is '//' good as a separator between an encoding name and
    a normalization form name? (at least, it would be bad
    if there were an encoding name including '/'.)
(3) Is the result exactly normalized? (This point must be
    most important. Enough verification should be to do.)

http://homepage1.nifty.com/nomenclator/perl/Encode-UnicodeNormalization-0.00.tar.gz

HTML (POD)
http://homepage1.nifty.com/nomenclator/perl/Encode-UnicodeNormalization.html


Regards,
SADAHIRO Tomoyuki

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

Reply via email to