On Wed, 07 May 2003 08:42:25 +0100 Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> Bjoern Jacke <[EMAIL PROTECTED]> writes: > >well, see: from_to claims to convert from encoding1 to encoding2. > >encoding1 in this case is utf-8. Also the non-composed UTF-8 is > >perfectly valid UTF-8 and there's absolutely no reason, why > >from_to($string,"utf8","latin1") should not work just because I used > >the NFD form and not the NFC form. Your example is just a way to work > >around this bug but from_to should not care if the initial string is > >NFC or NFD. > > Most of perl's encodings are octet-sequence/octet-sequence converters. > Which are easy to code, compact reasonably fast and ... dumb! > I also probably gave more thought to decode (from some form to Unicode) > rather than encode step - for decode producing NFC is natural. > > Perhaps it makes sense to add a tweak to encode side so that if no encoding > exists for the code point and code-point sequence is not normalize it tries > to normalize? For transcoding/normalization at once, I write a tiny module, which is somewhat broken, though: (1) Module name? (2) Is '//' good as a separator between an encoding name and a normalization form name? (at least, it would be bad if there were an encoding name including '/'.) (3) Is the result exactly normalized? (This point must be most important. Enough verification should be to do.) http://homepage1.nifty.com/nomenclator/perl/Encode-UnicodeNormalization-0.00.tar.gz HTML (POD) http://homepage1.nifty.com/nomenclator/perl/Encode-UnicodeNormalization.html Regards, SADAHIRO Tomoyuki