In a message dated 11/5/2003 3:42:42 PM Pacific Standard Time, [EMAIL PROTECTED] writes:
Topic-change alert! I'm not talking about glyph support in fonts, or Surrogate is defined in Unicode 2.0, which is published in 1996. Does NotePad in Windows 98 support it two years after Unicode 2.0 published? No, MS not even support Surrogaet in NotePad which came with WinME. In fact, you need to install special package into Win2K to enable Surrogate support. Why it take that long? Very simple. Because it is not as simple as you thought. If you caculate how long it take for MS to add surrogate support to the window support from the time surrogate defined in Unicode 2.0, you probably can find out how long it will take for a software to add surrogate support if they just start to add Unicode support.
One of these days I'm going to implement a "Unicode" front end that There are huge gap between "not silly" and "make it work". It is not that simple to make the whole software support surrogate correctly in every aspect.
> For back end software which do pure data process without keyboard ok.
example to show you how difficult to support surrogate:
Example 1: I have this api
UniChar is defined to be two byte holding 16 bits.
UniChar ToLower(UniChar aChar)
Tell me how to support Surrogate?
Example 2:
I have api
int FindCharInString( String, UniChar)
Tell me what the return value should mean ? Should it mean the count of UniChar from the beginning of String or should it mean the coutn of the CHARACTER from the beginning of the String. What should I do when I start to add surrogate support?
Example 3:
I have api
int LengthOfString(String)
Should this api return the number of UniChar or the number of CHARACTER?
Example 4:
I have api
String Left(String, int a)
What should a mean, the index of the UniChar or the index of CHARACTER?
1. Depending technology- for example, your software depend on Tcl but Tcl8.4.4 does not support surrogate.
2. Dependnig protocols- for example GSM 03.38 only define default alphabet, UCS-2 but not UTF-8. What is the piont for a GSM gateway to take the surrogate or not. Why bother, it will not be shown on people's cell phone because of the GSM protocol anyway.
3. The definitation of API- for example-
you have String int indexOf(int ch) Returns the index within this string of the first occurrence of the specified character. if the string a is "b" + a surrogaet pari + "c" and I call a.indexOf("c"). What should it return 1 or 2? if then the caller than call a.charAt(2) what should I return? the low surrogate? or the "c"?
char charAt(int index) Returns the character at the specified index. How can I return the whole surrogate pair if someone call a.charAt(1) ? or I should just return the high surrogate?
what should we return if someoen call a.substring(2) ? the low surrogate and the "c"? the high surrogate + the low surrogate plus the "c" ? error? What will happen if origionally the software do not return error code for substring and there are no excepting model to be involked?
4. Memory and Performance trade off.
You prbably can get a sense of difficulty if you look at how many specification change MS need to make to add surrogate support to the OpenType font. That is just specification change not include code changes or API changes.
It is easy to add surrogate support to your application if your application do nothing. It is difficult to add surrogate support (not impossible) if your application do some data processing. It is hard to add surrogate support if your software is a library which have previous defined API.
Look at
Format 4: Segment mapping to delta values
Supporting 4-byte character codes
I am not saying software should not support surrogate. I am saying don't under estimate the efforts. And while a software does upport surrogate correctly. Give them a praise instead of take it for granted. It is hard work.
==================================
Frank Yung-Fong Tang System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan John 3:16 "For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life. Does your software display Thai language text correctly for Thailand users? -> Basic Conceptof Thai Language linked from Frank Tang's Iñtërnâtiônàlizætiøn Secrets Want to translate your English text to something Thailand users can understand ? -> Try English-to-Thai machine translation at http://c3po.links.nectec.or.th/parsit/ |
- Re: UTF-16 inside UTF-8 YTang0648
- Re: UTF-16 inside UTF-8 Peter Kirk
- Re: UTF-16 inside UTF-8 YTang0648
- Re: UTF-16 inside UTF-8 YTang0648
- Re: UTF-16 inside UTF-8 Doug Ewell
- Re: UTF-16 inside UTF-8 Philippe Verdy
- Re: UTF-16 inside UTF-8 Philippe Verdy
- Re: UTF-16 inside UTF-8 YTang0648
- Re: UTF-16 inside UTF-8 Philippe Verdy
- Re: UTF-16 inside UTF-8 Doug Ewell
- Re: UTF-16 inside UTF-8 YTang0648
- Re: UTF-16 inside UTF-8 Doug Ewell
- Re: UTF-16 inside UTF-8 John Cowan
- Re: UTF-16 inside UTF-8 Frank Yung-Fong Tang
- Re: UTF-16 inside UTF-8 Doug Ewell
- Re: UTF-16 inside UTF-8 Frank Yung-Fong Tang
- RE: UTF-16 inside UTF-8 Philippe Verdy
- RE: UTF-16 inside UTF-8 Frank Yung-Fong Tang
- RE: UTF-16 inside UTF-8 Philippe Verdy
- RE: UTF-16 inside UTF-8 jon
- Re: UTF-16 inside UTF-8 Doug Ewell