Mark Davis [EMAIL PROTECTED] wrote:
You can determine that that particular text is not legal UTF-32*,
since there be illegal code points in any of the three forms. IF you
exclude null code points, again heuristically, that also excludes
UTF-8, and almost all non-Unicode encodings. That
PROTECTED]
Sent: Tuesday, April 23, 2002 23:02
Subject: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES
AGAIN)
Mark Davis [EMAIL PROTECTED] wrote:
You can determine that that particular text is not legal UTF-32*,
since there be illegal code points in any of the three forms. IF
you
Mark Davis [EMAIL PROTECTED] wrote:
I must not *call* the sequence UTF-16, since that term is
officially
reserved for BOM-marked text which can be either little- or
big-endian,
or BOMless text which must be big-endian.
Yes, assuming the BUT clause applies to (b). That is, the untagged
byte
On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote:
The Unix and Linux world is very
opposed to the use of BOM in plain-text files, and if they feel that way
about UTF-8 they probably feel the same about UTF-16.
Why? The problems with a BOM in UTF-8 have to do with it being an
Why? The problems with a BOM in UTF-8 have to do with it being an
ASCII-compatible encoding.
Err, no. That's not the point, AFAIK. The point is that traditionally
in UNIX there hasn't been any sort of marker or tag in the beginning,
UNIX files being flat streams of bytes. The UNIX toolset
On Wed, Apr 24, 2002 at 01:37:39PM -0400, [EMAIL PROTECTED] wrote:
Err, no. That's not the point, AFAIK. The point is that traditionally
in UNIX there hasn't been any sort of marker or tag in the beginning,
UNIX files being flat streams of bytes. The UNIX toolset has been built
with this
On Wed, 24 Apr 2002, David Starner wrote:
On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote:
The Unix and Linux world is very
opposed to the use of BOM in plain-text files, and if they feel that way
about UTF-8 they probably feel the same about UTF-16.
The reason we're not so
Doug Ewell scripsit:
The Unix and Linux world is very
opposed to the use of BOM in plain-text files, and if they feel that way
about UTF-8 they probably feel the same about UTF-16.
I doubt it. The trouble with BOMizing is that it makes ASCII not a
subset of UTF-8, but ASCII cannot be a
8 matches
Mail list logo