> The reason for ICU's "UTF-16" converter not trying to auto-detect the BOM > is that this seems to be something that the _application_ has to decide, > not the _converter_ that the application instantiates. > This converter name is (currently) only a convenience alias for "use the > UTF-16 byte serialization that is normally used on this machine".
I agree that the application may know better. It is just unfortunate that the name is not "UTF-16PE" to remind people that it is about platform endianness (sp?). Also, when used in a script using say uconv, the script does not have access to ucnv_detectUnicodeSignature(), so you end up in a situation where you get a file identified as being in "UTF-16" but when you use the "UTF-16" converter it may not be readable. If instead you had "UTF-16PE" as the convenience name for the platform endian UTF-16, and "UTF-16" handle the BOM and default byte order expectation (conformance clause C3 of TUS) then it'd be much easier on newcomers. YA