Quiz for Unicode guru

2004-08-19 Thread Frank Yung-Fong Tang
OK, just for fun Quiz for Unicode Guru Here is the quiz for the Unicoder. It is not a hard quiz. Everyone will get it right eventually. So, use stop watch to measure how long it will take for you figure out the right answer. Note: You can find the information of Unicode and UTF-8 from

problems in Public Review 33 UTF Conversion Code Update

2004-05-19 Thread Frank Yung-Fong Tang
Looking at http://www.unicode.org/review/ 33 UTF Conversion Code Update 2004.06.08 The C language source code example for UTF conversions (ConverUTF.c) has been updated to version 1.2 and is being released for public review and comment. This update

Yet another reason some software treat your UTF-8 xml as US-ASCII

2004-05-06 Thread Frank Yung-Fong Tang
For sure no one in this mailling list want to see your xml got treated as US-ASCII when the data is really in UTF-8. If I have an xml file like the following ?xml version="1.0"? and send over the HTTP protocol with the following content type header: Content-Type: text/xml; (without

OT: Standardize TimeZone ID

2004-04-23 Thread Frank Yung-Fong Tang
Is there any standard effort try to standardize Time Zone ID? I am not talking about the Time Zone which refer to a particular time (that could be done by GMT offset or addressed by ISO 8601) itself, but rather talking about an id refer to a particular time zone/ day light saving time rule.

unicode site problem

2004-04-22 Thread Frank Yung-Fong Tang
any one know who can fix http://www.unicode.org/reports/index.html ? all the links are broken

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
Raymond Mercier wrote on 4/22/2004, 7:35 AM: I enquired about the 'super font' created by a Beijing foundry, http://font.founder.com.cn/english/web/index.htm, and am fairly astonished at the prices, as you see from the attached. The cost of produce these fonts are much higher than

Unicode 4.0 and ISO10646-2003

2004-04-22 Thread Frank Yung-Fong Tang
I saw the announcment of publishing " ISO/IEC 10646: 2003, Information technology -- Universal Multiple-Octet Coded Character Set (UCS)" >From http://anubis.dkuug.dk/jtc1/sc2/open/02n3729.htm I expect there are no difference from Unicode 4.0, am I right?

Re: GB18030 and super font

2004-04-22 Thread Frank Yung-Fong Tang
In case you want to test your GB18030 font, you can use Netscape 7 (or lateset Mozilla) and then visit my GB18030 test pages at http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10 It should be page to page compatable to the paper copy of GB18030-2000 standard. I also

Re: Unicode 4.0 and ISO10646-2003

2004-04-22 Thread Frank Yung-Fong Tang
Kenneth Whistler wrote on 4/22/2004, 3:26 PM: Frank asked: I expect there are no difference from Unicode 4.0, am I right? Correct. Please see Appendix C of Unicode 4.0, p. 1348 and p. 1350, which already explicitly makes this statement. --Ken I don't see ISO10646-2003 in the

Re: help finding radical/stroke index at unicode.org

2004-04-14 Thread Frank Yung-Fong Tang
are you talking about http://www.unicode.org/charts/unihangridindex.html and http://www.unicode.org/charts/unihanrsindex.html ? Gary P. Grosso wrote on 4/14/2004, 1:18 PM: Hi, I am looking for an up-to-date, online version of the sort of thing I see in the back of the printed Unicode

Re: Novice question

2004-03-23 Thread Frank Yung-Fong Tang
Be careful here, for Unicode support in the browser (at least Netscape/Mozilla) there are some code fork between 2000/XP and Win98/ME. Philippe Verdy wrote on 3/23/2004, 5:39 AM: From: Edward H. Trager [EMAIL PROTECTED] Also, I would not bother testing Windows OSes prior to Windows

Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-16 Thread Frank Yung-Fong Tang
Chris Jacobs wrote on 3/15/2004, 10:08 PM: - Original Message - From: Kenneth Whistler [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, March 16, 2004 2:28 AM Subject: Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to

Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-16 Thread Frank Yung-Fong Tang
May be I should file an US patent application to write Arabic from left to right to make it more simplified :) I guess that will have more adoption rate compare to this font design patent since most software which does not support Bidi already implement them. :) Mark E. Shoulson wrote on

Re: in the NEW YORK TIMES today, report of a USA patent for a method to make the Arabic language easier to read/write/typeset

2004-03-15 Thread Frank Yung-Fong Tang
Wow. It seems not a very new idea. Similar idea have been used in Chinese 40 years ago and create the differences between Simplifed Chinese And Traditional Chinese. Michael Everson wrote on 3/15/2004, 12:40 PM: In the NEW YORK TIMES today comes a report of a USA patent for a new version of

Re: multibyte char display

2004-03-15 Thread Frank Yung-Fong Tang
many different reason you will see ? there. read my paper http://people.netscape.com/ftang/paper/unicode25/a302.htm to see a list. Manga wrote on 3/15/2004, 10:07 AM: I use UTF-8 encoding in java code to store multi byte characters in the db . When i retreive the multi byte characters

RE: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-15 Thread Frank Yung-Fong Tang
Mike Ayers wrote on 3/15/2004, 2:50 PM: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Frank Yung-Fong Tang Sent: Monday, March 15, 2004 11:16 AM It seems not a very new idea. Similar idea have been used in Chinese 40 years ago

Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Frank Yung-Fong Tang
Not sure how to find the information paper. But one way to check the degree of the support is to do a GetStringTypeEx agasinst some characters defined in 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 to see does those return result reflect what it should be. Antoine Leca wrote on 3/5/2004, 8:35 AM: Hi

Re: commandline converter for gb18030 - utf8 in *nix

2004-03-05 Thread Frank Yung-Fong Tang
you can also use 'nsconv' which come with mozilla source code with GB18030. see http://www.mozilla.org/projects/l10n/mlp_tools.html for details Zhang Weiwu wrote on 3/5/2004, 6:43 AM: Hello. I believe this must be a frequent question, but I googled around and I didn't find a satisfying

Re: Font Technology Standards

2004-03-03 Thread Frank Yung-Fong Tang
BDF is also widly used, although the quality and features is not that powerful these day. Also, there are other "standard" about the font: 1. Glyph set "standard"- how to make sure one font contains all the glyph for a particular group of users- for example- WGL4 is a glyph set standard from

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
oh. This is the first time I hear about this. Thanks about your information. Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined? or does it only mean wchar_t hold the character in ISO_10646 (which mean it could be 2 bytes, 4 bytes or more than that?) Noah Levitt wrote on

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
not prevent someone to make it 16 bits or 64 bits when that macro is defined, right? And what does the year and month mean? On Mar 03, 2004, at 12:38, Frank Yung-Fong Tang wrote: oh. This is the first time I hear about this. Thanks about your information. Does it also mean wchar_t is 4

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Frank Yung-Fong Tang
Clark Cox wrote on 3/3/2004, 4:33 PM: [I swap the reply order to make my new question clearer] And what does the year and month mean? It indicates which version of ISO10646 is used by the implementation. In the above example, it indicates whatever version was in effect in December

Re: What's in a wchar_t string on unix?

2004-03-01 Thread Frank Yung-Fong Tang
I Rick Cameron wrote on 3/1/2004, 2:13 PM: Hi, all This may be an FAQ, but I couldn't find the answer on unicode.org. The reason is there are "NO answer" to the question you ask. It seems that most flavours of unix define wchar_t to be 4 bytes. Depend on which UNIX

Re: unicode format

2004-02-23 Thread Frank Yung-Fong Tang
John Cowan wrote: steve scripsit: Could someone please clarify the difference between UTF8 and UFT16 please? If it is possible to encode everything in UTF8 and it is more efficient what is the need for UTF16? It is more efficient to PROCESS in UTF16.

RE: Mother Language Day

2004-02-23 Thread Frank Yung-Fong Tang
joe wrote: (Hmm, in Russian mother language (maternij jazik) means something *verry* different. Watch your language! ;-) He write this in English not Russian, right? How can I watch Chinese (my language) ? Joe

Re: Codes for Individual Chinese Brushstrokes

2004-02-20 Thread Frank Yung-Fong Tang
As a native Chinese person. I believe 1. The so called eight basic stroke is very standard in concept. But that is only 8. 2. They list 8 different varients for each of the 8 basic stroke. But if you read that page carefully, it does not mean that there are only 8 variants for each stroke,

Re: UTF-8 to UTF-16 conversion

2004-02-06 Thread Frank Yung-Fong Tang
Yes, TEC. look at developer.apple.com and look at Text Encoding Converter Paramdeep Ahuja wrote: Hi Can anyone tell if there is any API available on MAC to convert from UTF-8 to UTF-16 thnx -P

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
Consider CR and LF too. Mark Davis wrote on 1/14/2004, 9:25 AM: I'm not sure which one suggested heuristic method you are referring to, but you are bounding to conclusions. For example, one of the heuristics is to judge what are more common characters when bytes are interpreted as if

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
Does Thai use CR and LF? Peter Kirk wrote on 1/14/2004, 8:12 AM: On 14/01/2004 07:16, John Burger wrote: ... By the way, I still don't quite understand what's special about Thai. Could someone elaborate? I mentioned Thai because it is the only language I know of which does

Re: Detecting encoding in Plain text

2004-01-14 Thread Frank Yung-Fong Tang
John Burger wrote on 1/14/2004, 7:16 AM: Mark E. Shoulson wrote: If it's a heuristic we're after, then why split hairs and try to make all the rules ourselves? Get a big ol' mess of training data in as many languages as you can and hand it over to a class full of CS graduate

Re: Programmatic description of ideographic characters

2004-01-03 Thread Frank Yung-Fong Tang
looks like an old idea people in Taiwan gave up long time ago because of the issue of the quality of glyph will never be good enough. Tom Emerson wrote on 1/2/2004, 6:06 PM: The following paper, Chinese Character Synthesis using METAPOST, was recently mentioned in a thread on the teTeX

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
come on, take my joke. but that is a perfect example of language specific variant glyph, right? Michael Everson wrote: At 17:13 -0800 2003-12-02, Frank Yung-Fong Tang wrote: come on, use language specific glyph substution on the last resort font to show Irish last resort glyph

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
Peter Kirk wrote: On 02/12/2003 16:25, Frank Yung-Fong Tang wrote: ... a barrier to proper internationalisation ? My opinion is reverse, I think it is a strategy to proper internationalization. Remember, people can always choose to stay with ISO-8859-1 only or go to UTF-8

Re: MS Windows and Unicode 4.0 ?

2003-12-03 Thread Frank Yung-Fong Tang
, it will be 1% of efforts for me to fix it later, right? :) Michael Everson wrote: At 15:38 -0800 2003-12-03, Frank Yung-Fong Tang wrote: I am encouraging QA to test MES-1 with UTF-8 instead of only ISO-8859-1. I am encouraging product ship with MES-1 support out of the box instead

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
than 10 scripts ? I think the value is it show poeple it is not a ? ASCII question mark itself. -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Subject: Re: MS Windows and Unicode 4.0 ? I'm interested in knowing whether the following features would soon be found in Windows : fonts for scripts covered by Unicode 4.0, corresponding rendering engine to display all Unicode 4.0 scripts -- -- Frank Yung-Fong Tang tm rhtt, Itrntin

Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

2003-12-02 Thread Frank Yung-Fong Tang
-8 gzip of SCSU gzip of BOCU-1 gzip of Legacy encoding -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

Re: How can I have OTF for MacOS

2003-12-02 Thread Frank Yung-Fong Tang
John Jenkins wrote: On Dec 1, 2003, at 4:24 PM, Frank Yung-Fong Tang wrote: John What 'cmap' format Apple use in the MacOS X Devanagari and Bangla fonts? The formats are irrelevant; the Mac supports all the 'cmap' subtable formats for all subtables. For rendering complex

RE: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Michael Everson wrote: At 14:23 -0800 2003-12-02, Frank Yung-Fong Tang wrote: It's better than not knowing what range the thing is in. It helps the user know he has received, say, Telugu data or whatever. Only if the user know what Telugu may look like. How many users other

Re: UTF-16 inside UTF-8

2003-12-02 Thread Frank Yung-Fong Tang
Doug Ewell wrote: Frank Yung-Fong Tang ytang0648 at aol dot com wrote: Then, Frank, the Tcl implementation is *not valid UTF-8* and needs to be fixed. Plain and simple. If a system like Tcl only supports the BMP, that is its choice, but it *must not* accept non-shortest UTF-8 forms

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
Peter Kirk wrote: On 02/12/2003 14:19, Frank Yung-Fong Tang wrote: A better approach than asking Does product X support Unicode 4.0 which in some way you can always get a NO answer is to 1. Define a smaller set of functionality (Such as MES-1, MES-2, MES-3A) 2. Ask 'Does

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Frank Yung-Fong Tang
://homepage..mac.com/jhjenkins/ -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: UTF-16 inside UTF-8

2003-12-02 Thread Frank Yung-Fong Tang
Philippe Verdy wrote: Frank Yung-Fong Tang writes: But how about the UTF-16 vs UCS4 battle? Forget it: nearly nobody uses UCS-4 except very internally for string processing at the character level. For whole strings, nearly everybody uses UTF-16 as it performs better with less

Re: creating a test font w/ CJKV Extension B characters.

2003-12-01 Thread Frank Yung-Fong Tang
NT\CurrentVersion\LanguagePack] SURROGATE=dword:0002 [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42] IEFixedFontName=Code2001 IEPropFontName=Code2001 /code Andrew -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies

Re: How can I have OTF for MacOS

2003-12-01 Thread Frank Yung-Fong Tang
rendering, it cannot support them. John H. Jenkins John What 'cmap' format Apple use in the MacOS X Devanagari and Bangla fonts? -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
should also compare the same for things like keyword searches and file systems even though it is technically incorrect. Carl -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
the questioning party is thinking must be given as a part of said question. oh... really, what kind of Unicode support in Windows 2.0? (since you said- *any*)... No... I don't really care. Don't try to answer me. -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta

Re: Request

2003-11-21 Thread Frank Yung-Fong Tang
with this weired specification - ISCII. (if you don't think it is weired, look at the E-1 Display Attributes session in Annex-E of ISCII which is worst than the E-2 Font Attributes I mentioned here.) -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
: Frank Yung-Fong Tang wrote, If you visit http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596 and your machine have surrogate support install correctly and surrogate font install correctly then you should see surrogate characters show up match the gif

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan John 3:16 For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life. Does your

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Frank Yung-Fong Tang
Michael (michka) Kaplan wrote: From: Frank Yung-Fong Tang [EMAIL PROTECTED] so.. in summary, how is your concusion about the quality of GB18030 support on IE6/Win2K ? If you run the same test on Mozilla / Netscape 7.0, what is your conclusion about that quality of support

Re: UTF-16 inside UTF-8

2003-11-19 Thread Frank Yung-Fong Tang
. If you still think adding 4 bytes UTF-8 support is 1% of the task, then please join the Tcl project and help me fix that. I appreciate your efforts there and I beleive a lot of people will thank for your contribution. Doug Ewell wrote: Frank Yung-Fong Tang YTang0648 at aol dot com wrote

Re: Problems encoding the spanish o

2003-11-19 Thread Frank Yung-Fong Tang
. _ Charla con tus amigos en lnea mediante MSN Messenger. http://messenger.microsoft.com/es -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg: frankyungfongtan John 3:16 For God so loved the world

Re: What does i18n mean?

2003-11-19 Thread Frank Yung-Fong Tang
bandied about a lot. It is a short hand for "Irn " because it is too hard for most of the people to type the "r" part. :) [and if your software can save that string retrive it correct later, 50% of the i18n problem is addressed] -- Frank Yung-Fong Tang

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
about fonts. Could someone recommend a good tutorial or 'font creator' application that addresses surrogate pairs? Thanks, Erik Ostermueller -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo! Msg

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
are you using Netscape7 / Mozilla or IE? If you use IE, then IE may have a bug about that. I think Mozilla should not have the problem since I develope and test it by myself. [EMAIL PROTECTED] wrote: . Frank Yung-Fong Tang wrote, If you visit http://people.netscape.com/ftang

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
Philippe Verdy wrote: From: Frank Yung-Fong Tang [EMAIL PROTECTED] It is not that easy for you from don't know beans about fonts to creat a test font that contains ... \u20050. If you are lucky, it will take you several month if not year. There are commercial base font tool

Re: creating a test font w/ CJKV Extension B characters.

2003-11-19 Thread Frank Yung-Fong Tang
# ftxinstalledfonts # ftxruler # ftxvalidator John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage..mac.com/jhjenkins/ -- -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913 Yahoo

Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-17 Thread Frank Yung-Fong Tang
hum a very stupid (but work) way. 1. use vi 2. type #x + the Unicode text + ; for each characters 3. save it as .html 4. open the file by using browser 5. copy the text 6. paste into your software. -- Frank Yung-Fong Tang tm rhtt, Itrntinl Dvlpmet, AOL Intrtv Srvies AIM:yungfongta mailto