Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread Doug Ewell
Mark Davis [EMAIL PROTECTED] wrote: You can determine that that particular text is not legal UTF-32*, since there be illegal code points in any of the three forms. IF you exclude null code points, again heuristically, that also excludes UTF-8, and almost all non-Unicode encodings. That

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread Mark Davis
PROTECTED] Sent: Tuesday, April 23, 2002 23:02 Subject: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN) Mark Davis [EMAIL PROTECTED] wrote: You can determine that that particular text is not legal UTF-32*, since there be illegal code points in any of the three forms. IF you

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread Doug Ewell
Mark Davis [EMAIL PROTECTED] wrote: I must not *call* the sequence UTF-16, since that term is officially reserved for BOM-marked text which can be either little- or big-endian, or BOMless text which must be big-endian. Yes, assuming the BUT clause applies to (b). That is, the untagged byte

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread David Starner
On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote: The Unix and Linux world is very opposed to the use of BOM in plain-text files, and if they feel that way about UTF-8 they probably feel the same about UTF-16. Why? The problems with a BOM in UTF-8 have to do with it being an

RE: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread jarkko . hietaniemi
Why? The problems with a BOM in UTF-8 have to do with it being an ASCII-compatible encoding. Err, no. That's not the point, AFAIK. The point is that traditionally in UNIX there hasn't been any sort of marker or tag in the beginning, UNIX files being flat streams of bytes. The UNIX toolset

RE: UNICODE BOMBER STRIKES AGAIN

2002-04-24 Thread Yves Arrouye
You can determine that that particular text is not legal UTF-32*, since there be illegal code points in any of the three forms. IF you exclude null code points, again heuristically, that also excludes UTF-8, and almost all non-Unicode encodings. That leaves UTF-16, 16BE, 16LE as the only

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread David Starner
On Wed, Apr 24, 2002 at 01:37:39PM -0400, [EMAIL PROTECTED] wrote: Err, no. That's not the point, AFAIK. The point is that traditionally in UNIX there hasn't been any sort of marker or tag in the beginning, UNIX files being flat streams of bytes. The UNIX toolset has been built with this

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-24 Thread Mark Davis
://www.macchiato.com - Original Message - From: Yves Arrouye [EMAIL PROTECTED] To: 'Mark Davis' [EMAIL PROTECTED]; Doug Ewell [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: Kenneth Whistler [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, April 24, 2002 10:39 Subject: RE: UNICODE BOMBER STRIKES

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread Jungshik Shin
On Wed, 24 Apr 2002, David Starner wrote: On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote: The Unix and Linux world is very opposed to the use of BOM in plain-text files, and if they feel that way about UTF-8 they probably feel the same about UTF-16. The reason we're not so

Re: Variations of UTF-16 (was: Re: UNICODE BOMBER STRIKES AGAIN)

2002-04-24 Thread John Cowan
Doug Ewell scripsit: The Unix and Linux world is very opposed to the use of BOM in plain-text files, and if they feel that way about UTF-8 they probably feel the same about UTF-16. I doubt it. The trouble with BOMizing is that it makes ASCII not a subset of UTF-8, but ASCII cannot be a

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-23 Thread Florian Weimer
[EMAIL PROTECTED] writes: FYI: http://linguistlist.org/issues/13/13-1106.html#3 And I thought the Unicode bomber was %u9090%u6858%ucbd3... guy!

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-23 Thread Mark Davis
-bin/icu/tr] http://www.macchiato.com - Original Message - From: Doug Ewell [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Kenneth Whistler [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, April 22, 2002 20:49 Subject: Re: UNICODE BOMBER STRIKES AGAIN Kenneth Whistler [EMAIL PROTECTED

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-22 Thread Doug Ewell
There he sits in wait until you switch on, and BAM, all your data turns to squares and the little beastie is laughing his socks off. That should have been BOM. -Doug Ewell Fullerton, California

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-22 Thread Tex Texin
Doug Ewell pun'ed: There he sits in wait until you switch on, and BAM, all your data turns to squares and the little beastie is laughing his socks off. That should have been BOM. Yes, and turns to squares should have been turns to replacement characters. --

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-22 Thread ろ〇〇〇〇 ろ〇〇〇
From: Tex Texin [EMAIL PROTECTED] To: Doug Ewell [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: "UNICODE BOMBER STRIKES AGAIN" Date: Mon, 22 Apr 2002 15:21:48 -0400 Doug Ewell pun'ed: There he sits in wait until you switch on, and BAM, all your dat

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-22 Thread Peter_Constable
On 04/22/2002 02:21:48 PM Tex Texin wrote: Doug Ewell pun'ed: There he sits in wait until you switch on, and BAM, all your data turns to squares and the little beastie is laughing his socks off. That should have been BOM. Yes, and turns to squares should have been turns to replacement

Re: UNICODE BOMBER STRIKES AGAIN

2002-04-22 Thread Kenneth Whistler
Doug Ewell pun'ed: There he sits in wait until you switch on, and BAM, all your data turns to squares and the little beastie is laughing his socks off. That should have been BOM. Yes, and turns to squares should have been turns to replacement characters. There he sits in wait