> > Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
> > UTF16_BigEndian?
> 
> ICU does not do Unicode-signature or other encoding detection 
> as part of a converter. When you get text from some protocol, 
> you need to instantiate a converter according to what you 
> know about the encoding.

So I can't pass it some text with a BOM and say "utf-16" and let it run
through that. I guess that explains why I also didn't find converters that
write a BOM at the start of the conversion. Is that something that would
added to ICU in the future? It would be very nice to have a converter that
would pick the BOM (and write it back).

And yes, most of the time, when you stay on a given platform, it is very
convenient to use the platform's endianness. My question was rather "why
isn't UTF-16 the one that detects the BOM and defaults to an externalized
form, BE, and then people on a given platform would just use UTF-16PE (of
which UTF-16 is an alias in ICU)?". That would facilitate interchange of
information.

YA

Reply via email to