Re: Unicode sorting...

2001-06-08 Thread Bryan C . Warnock
On Friday 08 June 2001 02:17 pm, NeonEdge wrote: > > Another example is the chinese has no definite > > sorting order, period. The commonly used scheme are > > phonetic-based or stroke-based. Since many characters > > have more than one pronounciations (context sensitive) > > and more than one for

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
> The A-Z syntax is really a shorthand for "All the uppercase letters". > (Originally at least) I won't argue the problems with sorting various sets > of characters in various locales, but for regexes at least it's not an > issue, because the point isn't sorting or ordering, it's identifying >

RE: Unicode sorting...

2001-06-08 Thread Dan Sugalski
At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote: > > If this is the case, how would a regex like "^[a-zA-Z]" work (or other, >more > > sensitive characters)? If just about anything can come between A and Z, >and > > letters that might be there in a particular locale aren't in another >locale, > > th

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
> > If this is the case, how would a regex like "^[a-zA-Z]" work (or other, > more > > sensitive characters)? If just about anything can come between A and Z, > and > > letters that might be there in a particular locale aren't in another > locale, > > then how will regex engine make the distinctio

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
> If this is the case, how would a regex like "^[a-zA-Z]" work (or other, more > sensitive characters)? If just about anything can come between A and Z, and > letters that might be there in a particular locale aren't in another locale, > then how will regex engine make the distinction? This synt

RE: Unicode sorting...

2001-06-08 Thread NeonEdge
> Another example is the chinese has no definite > sorting order, period. The commonly used scheme are > phonetic-based or stroke-based. Since many characters > have more than one pronounciations (context sensitive) > and more than one forms (simplified and traditional). > So if we have a mix cont

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
ny languages have special combinations like > ch, ss, ij that require special attention. My understanding is there is NO general unicode sorting, period. The most useful one must be locale-sensitive, as defined by unicode collation. In practice, the story is even worse. For example, how do you so

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
> I can't really believe that this would be a problem, but if they're > integrated alphabets from different locales, will there be issues > with sorting (if we're not planning to use the locale)? Are there > instances where like characters were combined that will affect the > sort orders? Yes, it

Unicode sorting...

2001-06-08 Thread NeonEdge
I can't really believe that this would be a problem, but if they're integrated alphabets from different locales, will there be issues with sorting (if we're not planning to use the locale)? Are there instances where like characters were combined that will affect the sort orders? Grant M.