Because I have not received a copy of the following via the Unicode List, I have assumed the sender (who is probably well known to at least some as editor of the SQL standard) may not currently be a member of the list. Since he clearly intended this message to go to the list, and because it is relevant to a question I posted earlier, I hope to be forgiven for taking the liberty of forwarding it. Mike. ----- Original Message ----- From: "Jim Melton" <[EMAIL PROTECTED]> To: "J M Sykes" <[EMAIL PROTECTED]>; "Unicode List" <[EMAIL PROTECTED]> Cc: "Fred Zemke" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, February 10, 2001 12:34 AM Subject: [Fwd: Unicode collation algorithm - interpretation] Mike, In a message that you sent to the Unicode list on 8 February, you addressed the question of parameterized invocations of collations: Date: Thu, 8 Feb 2001 04:49:21 -0800 (GMT-0800) >In the proposal for better accommodating UCS in SQL, we assumed that a >comparison performed according to UTR#10, "Unicode Technical Standard >#10 >Unicode Collation Algorithm", would require four parameters, viz. > > Two strings to be compared > > A collation element table > > A maximum level as mentioned in UTR#10, section 4.3 > "Form a sort key for each string", which specifies Step 3. > >SQL already uses the term 'collation', each of which is identified by a ><collation name>, but does not accommodate the notion that the same >collation element table can be applied at different levels. > >In our proposal, we have assumed that <collation name> identifies a >collation element table, and have extended SQL syntax to allow the user >to >specify the fourth parameter (or leave it to be defaulted). > >It has been suggested that SQL <collation name> should instead identify >both >collation element table and maximum level. > >Perhaps the second approach might be useful in the case where, for >reasons >of performance, sort keys are constructed in advance of being needed, >for >example to be stored as 'shadow columns' in SQL base tables, or in >indexes. > >On the other hand, the first approach seems to be more user-friendly in >the >case where at least two collation element tables are available, provided >their levels correspond (i.e. provided level 2 means 'case-blind' in >both >cases). > >Would anyone care to comment? Indeed, I would. I think you probably were told by Hugh Darwen that he had spoken to me and that I stated that I thought it unlikely that the code written to implement the Unicode collation algorithm (more particularly, code written to implement ISO 14651, the collating standard) would be parameterized to allow specification of different levels. I particularly want to respond to the statement that you made: >It has been suggested that SQL <collation name> should instead identify >both >collation element table and maximum level. In this statement, your wording makes it appear that the suggestion was based on some matter of personal taste or something similarly refutable. I did not respond to this aspect of your draft proposal on the basis of any whimsy, but on the basis that I do not believe that it is technically appropriate, even if we can somehow coerce bits of technology into making this happen. In fact, I believe that the "maximum level" is built into the collation element table inseparably. I monitored the email discussions rather a lot during the development of ISO 14651 and it seemed awfully likely as a result of the discussions (plus conversations that I've had with implementors in at least 3 companies) that a specific collation would be built by constructing the collation element table (as you mentioned in your note) and then "compiling" it into the code that actually does the collation. That code would *inherently* have built into it the levels that were specified in the collation table that was constructed. It's not like the code can pick and choose which of the levels it wishes to honor. Of course, if you really want to specify an SQL collation name that somehow identifies 2 or 3 or 4 (or more) collations built in conformance with ISO 14651 and then use an additional parameter to choose between them, I guess that's possible (but not, IMHO, desirable). However, it would be very difficult to enforce a rule that says that the collection of collations so identified are "the same" except for the level chosen. One could be oriented towards, say, French, and the other towards German or Thai and it would be very hard for the SQL engine to know that it was being misled. I hope that this allays your apparent assumption that my suggestion was somehow based on some aspect of personal preference. Thanks, Jim ======================================================================== Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Oracle Corporation Oracle Email: mailto:[EMAIL PROTECTED] 1930 Viscounti Drive Standards email: mailto:[EMAIL PROTECTED] Sandy, UT 84093-1063 Personal email: mailto:[EMAIL PROTECTED] USA Fax : +1.801.942.3345 ======================================================================== = Facts are facts. However, any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. = ========================================================================