RE: Should we care much about this Unicode-ish criticism?

2001-06-11 Thread Hong Zhang
However, I don't think this actually affects your comments, except that I'd guess that the half digits mentioned by Hong don't have the same term case used with them that the letters of various alphabets do. I am not sure if we mean the same thing. The regular ascii 0123456789 are called

RE: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread NeonEdge
Personally, I'm not sure that Perl 6 even needs to worry about any of this. I think as long as Perl can manipulate the characters without screwing them up in any way, that should be fine. As far as support for locale specifics, maybe it would just be best to do what Perl does now, but allow

Re: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread Nicholas Clark
On Sat, Jun 09, 2001 at 06:24:44AM -0400, NeonEdge wrote: (correct me if I'm wrong): Is this Uppercase? Is this Lowercase (is this 'half-digits', as Hong mentioned?) upper case and lower case digits are used by old fashioned] typography [I don't know the exact details] I think upper case is

Re: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread Bryan C . Warnock
On Saturday 09 June 2001 06:24 am, NeonEdge wrote: Is this Uppercase? Is this Lowercase (is this 'half-digits', as Hong mentioned?) (if 'Caseless' needed, just !Upper !Lower?) Titlecase. Is this Punctuation? Is this a digit? Is this a word character? Is this Whitespace? Maps to

Re: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread Bryan C . Warnock
/. rebuttal of the original article at http://slashdot.org/features/01/06/06/0132203.shtml, for those that haven't seen it yet. -- Bryan C. Warnock [EMAIL PROTECTED]

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Dan Sugalski
At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote: Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the encodings currently in use less inadequate?

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Dan Sugalski
At 10:41 PM 6/7/2001 -0400, Buddha Buck wrote: Nick Ing-Simmons [EMAIL PROTECTED] writes: Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Russ Allbery
Dan Sugalski [EMAIL PROTECTED] writes: At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote: One reason perl5.7.1+'s Encode does not do asian encodings yet is that the tables I have found so far (Mainly Unicode 3.0 based) are lossy. Joy. Hopefully by the time we're done there'll be a full

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Russ Allbery
Nicholas Clark [EMAIL PROTECTED] writes: What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned if

RE: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Hong Zhang
What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned if I know; I didn't know there even

RE: Should we care much about this Unicode-ish criticism?

2001-06-07 Thread Garrett Goebel
From: David L. Nicol [mailto:[EMAIL PROTECTED]] Russ Allbery wrote: a caseless character wouldn't show up in either IsLower or IsUpper. maybe an IsCaseless is warrented -- or Is[Upper|Lower] could return UNKNOWN instead of TRUE|FALSE, if the extended boolean attributes allow

Re: Should we care much about this Unicode-ish criticism?

2001-06-07 Thread Buddha Buck
Nick Ing-Simmons [EMAIL PROTECTED] writes: Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the encodings currently in use less inadequate? I've

RE: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread NeonEdge
Before people get their panties in a bunch, I'm not dissing Unicode. The point that I am trying to make is that Unicode will probably never make everyone happy. It WILL likely become widely accepted, and should offer the best solution yet to integrating the major character sets into one. If the

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Simon Cozens
On Wed, Jun 06, 2001 at 07:28:45AM -0400, NeonEdge wrote: If that was the goal, then they failed. Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6. We haven't done that yet. That was our goal, so we failed? -- IT support will, from 1 October 2000, be provided by college and

RE: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread NeonEdge
Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6. We haven't done that yet. That was our goal, so we failed? Don't be ridiculous. With that as our goal, the ONLY way we could fail is to NEVER write Perl 6. Unicode, on the other hand, was originally released for public

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Simon Cozens
On Wed, Jun 06, 2001 at 09:57:58AM -0400, NeonEdge wrote: Perl 6 cannot assume that Unicode is done. Don't tell anyone, but it never did. -- Thus spake the master programmer: After three days without programming, life becomes meaningless. -- Geoffrey James, The Tao of

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Dan Sugalski
At 06:59 PM 6/5/2001 -0700, Larry Wall wrote: Dan Sugalski writes: : At 04:44 PM 6/5/2001 -0700, Larry Wall wrote: : (Perl 5 extends it all the way to 64-bit values, represented in 13 bytes!) : : I know we can, but is it really a good idea? 32 bits is really stretching : it for character

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Larry Wall
Russ Allbery writes: : Yeah, but one of the guarantees of UTF-8 is: : :- The octet values FE and FF never appear. : : I can see that this property may not be that important, but it makes me : feel like things that don't have this property aren't really UTF-8. Which is one of the reasons I

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread David L. Nicol
Russ Allbery wrote: a caseless character wouldn't show up in either IsLower or IsUpper. maybe an IsCaseless is warrented -- or Is[Upper|Lower] could return UNKNOWN instead of TRUE|FALSE, if the extended boolean attributes allow transbinary truth values.

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski
At 06:22 PM 6/5/2001 +0100, Simon Cozens wrote: On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote: Is it just me, or does this entire article reduce not to Unicode doesn't work but Unicode should assign more characters? Yes. And Unicode has assigned more characters; it's factually

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
On Tue, Jun 05, 2001 at 01:31:38PM -0400, Dan Sugalski wrote: The other issue it actively brought up was the complaint about having to share glyphs amongst several languages, which didn't strike me as all that big a deal either, except perhaps as a matter of national pride and/or easy

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
Courtesy of Slashdot, http://www.hastingsresearch.com/net/04-unicode-limitations.shtml I'm not sure if this is an issue for us or not, as we're generally language-neutral, and I don't see any technical issues with any of the UTF-* encodings having headroom problems. I think the author

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
Firstly, the JIS standard defines, along with the ordering and enumeration of its characters, their glyph shape. Unicode, on the other hand does not. This means that as far as Unicode is concerned, there is literally no distinction between two distinct shapes and hence no way to specify

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bart Lateur
On 05 Jun 2001 11:07:11 -0700, Russ Allbery wrote: Particularly since part of his contention is that 16 bits isn't enough, and I think all the widely used national character sets are no more than 16 bits, aren't they? It's not really important. UTF-8 is NOT limited to 16 bits (3 bytes). With 4

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
On Tue, Jun 05, 2001 at 09:16:05PM +0200, Bart Lateur wrote: Unicode text files No such animal. Unicode's a character repertoire, not an encoding. See you at my Unicode tutorial at TPC? :) -- buf[hdr[0]] = 0;/* unbelievably lazy ken (twit) */ - Andrew Hume

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski
At 11:18 AM 6/5/2001 -0700, Hong Zhang wrote: Firstly, the JIS standard defines, along with the ordering and enumeration of its characters, their glyph shape. Unicode, on the other hand does not. This means that as far as Unicode is concerned, there is literally no distinction between

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski
At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: Bart Lateur [EMAIL PROTECTED] writes: UTF-8 is NOT limited to 16 bits (3 bytes). That's an odd definition of byte you have there. :) Maybe it's RAD50. :) Still, it may take 3 bytes to represent in UTF-8 a character that takes 2 bytes in

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bryan C . Warnock
On Tuesday 05 June 2001 03:24 pm, Dan Sugalski wrote: The second objection is again related to character versus glyph issues: since Chinese, I think this problem =~ locale. For any unicode character, you can not properly tell its lower case or upper case without considering locale.

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
On Tue, Jun 05, 2001 at 05:39:36PM -0400, Bryan C . Warnock wrote: Some languages don't have upper or lower case. Are tests and translations on caseless characters true or false? (Or undefined?) I'd say undefined. Should the same Unicode character, when used in two different languages,

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Dan Sugalski [EMAIL PROTECTED] writes: At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: (As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable byte encoding, with each character taking up anywhere from one to six bytes in the encoded form depending on where in Unicode the

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Bryan C Warnock [EMAIL PROTECTED] writes: Some additional stuff to ponder over, and maybe Unicode addresses these - I haven't been able to read *all* the Unicode stuff yet. (And, yes, Simon, you will see me in class.) Some languages don't have upper or lower case. Are tests and

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bryan C . Warnock
On Tuesday 05 June 2001 05:49 pm, Simon Cozens wrote: YES. Definitely. Same Unicode character, same thing. You wanted something else, use a different Unicode character. I don't understand. There *is* only one character. I can't choose another. Take 0x0648, for instance. It's both waw, the

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Simon Cozens [EMAIL PROTECTED] writes: On Tue, Jun 05, 2001 at 03:27:03PM -0700, Russ Allbery wrote: Caseless characters should be guaranteed unchanged by conversion to upper or lower case, IMO. I think Bryan's asking more about \p{IsUpper} than uc(). Ahh... well, Unicode classifies them

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread NeonEdge
The problem as I see it, is not that the mechanism can't handle the languages, it is that the Latin/Gothic countries chose first, and gave what's left to the Oriental countries. This is evident in the Musical Symbols and even Byzantine Musical Symbols. Are these character sets more important

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
NeonEdge [EMAIL PROTECTED] writes: This is evident in the Musical Symbols and even Byzantine Musical Symbols. Are these character sets more important than the actual language character sets being denied to the other countries? Are musical and mathematical symbols even a language at all? At

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall
Dan Sugalski writes: : Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, : but that's in the Unicode 3.0 standard. Doesn't really matter where they install the artificial cap, because for philosophical reasons Perl is gonna support larger values anyway. It's just that 4

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Larry Wall [EMAIL PROTECTED] writes: Doesn't really matter where they install the artificial cap, because for philosophical reasons Perl is gonna support larger values anyway. It's just that 4 bytes of UTF-8 happens to be large enough to represent anything UTF-16 can represent with

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Russ Allbery [EMAIL PROTECTED] writes: That's probably unnecessary; I really don't expect them to ever use all 31 bytes that the IETF-standardized version of UTF-8 supports. 31 bits, rather. *sigh* But given that, modulo some debate over CJKV, we're getting into *really* obscure stuff

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote: In the meantime, the normally-encountered working character set of modern Asian languages has been in Unicode from the beginning, and currently the older and rarer characters and the characters used these days only in proper names

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Jarkko Hietaniemi
On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote: NeonEdge [EMAIL PROTECTED] writes: This is evident in the Musical Symbols and even Byzantine Musical Symbols. Are these character sets more important than the actual language character sets being denied to the other

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski
At 04:44 PM 6/5/2001 -0700, Larry Wall wrote: Dan Sugalski writes: : Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, : but that's in the Unicode 3.0 standard. Doesn't really matter where they install the artificial cap, because for philosophical reasons Perl is gonna

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall
Russ Allbery writes: : Particularly since extending UTF-8 to more : than 31 bits requires breaking some of the guarantees that UTF-8 makes, : unless I'm missing how you're encoding the first byte so as not to give it : a value of 0xFE. The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall
Dan Sugalski writes: : At 04:44 PM 6/5/2001 -0700, Larry Wall wrote: : (Perl 5 extends it all the way to 64-bit values, represented in 13 bytes!) : : I know we can, but is it really a good idea? 32 bits is really stretching : it for character encoding, and 64 seems rather excessive. Such large

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery
Larry Wall [EMAIL PROTECTED] writes: Russ Allbery writes: Particularly since extending UTF-8 to more than 31 bits requires breaking some of the guarantees that UTF-8 makes, unless I'm missing how you're encoding the first byte so as not to give it a value of 0xFE. The UTF-16 BOMs, 0xFEFF