David,
My reading has always been that if you are using
UTF-16, then BOM applies. The key sentence for
me in the referenced normative section is :
"The terms "UTF-8" and "UTF-16" in this
specification do not apply to character encodings
with any other labels, even if the encodings or
labels are very similar to UTF-8 or UTF-16."
The second reference you provide allows for no
BOM, but in the examples states that the
encoding could be UTF-16LE (or UTF-16BE) or any
number of other 16bit encodings, and
"the encoding declaration must be read to
determine which"
So, if you don't use a BOM then you are not using
UTF-16, you must be using the more specific
UTF-16LE or UTF-16BE and thus must have defined
it in the encoding declaration.
So for the parser to recognise as UTF-16 without
a more specific encoding declaration, you must
start with a BOM. If you don't start with a BOM
then you cannot use UTF-16 (in a naming sense),
you must be using UTF-16LE or UTF-16BE and this
must be defined in the encoding declaration.
Cheers,
Berin
>
> From: David N Bertoni/Cambridge/IBM <[EMAIL PROTECTED]>
> Subject: RE: std::istream as XSLTInputSource
> Date: 04/02/2003 11:04:47
> To: [email protected]
>
>
>
>
>
> Hi Don,
>
> This is all very confusing, so I'm going to ask someone else what their
> opinion is. The second URL points to part of the recommendation that's
> non-normative, but I may be mis-reading the first part.
>
> Dave
>
>
>
>
>
> "Don McClimans"
>
> <[EMAIL PROTECTED] To: "David N
> Bertoni/Cambridge/IBM" <[EMAIL PROTECTED]>
> ronics.com> cc:
>
> Subject: RE: std::istream
> as XSLTInputSource
> 01/31/2003 12:06 PM
>
>
>
>
>
>
> >>If so, do I have to start the stream with a BOM, for the parser to
> >>recognise it as UTF-16?
> >
> >You have to do what the XML recommendation says:
> >
> > http://www.w3.org/TR/REC-xml#charencoding
> > http://www.w3.org/TR/REC-xml#sec-guessing
> >
> >So the answer is yes.
>
> Dave,
>
> Hmm, as I read that second URL, the answer is no. It says that using a byte
> order mark is fine, but without a byte order mark, the parser should be
> able
> to tell what encoding is being used by looking at the first four bytes of
> the file, which should be "<?" in UTF-16.
>
> Don
>
>
>
>
This message was sent through MyMail http://www.mymail.com.au