OK, just for fun
Quiz for Unicode Guru
Here is the quiz for the Unicoder. It is not a hard quiz. Everyone will
get it right eventually. So, use stop watch to measure how long it will
take for you figure out the right answer.
Note: You can find the information of Unicode and UTF-8 from
Looking at
http://www.unicode.org/review/
33
UTF Conversion
Code Update
2004.06.08
The C
language source code example for UTF conversions (ConverUTF.c) has been
updated to version 1.2 and is being released for public review and
comment. This update
For sure no one in this
mailling list want to see your xml got treated as US-ASCII when the
data is really in UTF-8.
If I have an xml file like the following
?xml version="1.0"?
and send over the HTTP protocol with the following content type header:
Content-Type: text/xml;
(without
Is there any standard effort try to standardize Time Zone ID? I am not
talking about the Time Zone which refer to a particular time (that could
be done by GMT offset or addressed by ISO 8601) itself, but rather
talking about an id refer to a particular time zone/ day light saving
time rule.
any one know who can fix
http://www.unicode.org/reports/index.html ?
all the links are broken
Raymond Mercier wrote on 4/22/2004, 7:35 AM:
I enquired about the 'super font' created by a Beijing foundry,
http://font.founder.com.cn/english/web/index.htm, and am fairly
astonished
at the prices, as you see from the attached.
The cost of produce these fonts are much higher than
I saw the announcment of publishing
" ISO/IEC 10646: 2003, Information technology --
Universal Multiple-Octet Coded Character Set (UCS)"
>From http://anubis.dkuug.dk/jtc1/sc2/open/02n3729.htm
I expect there are no difference from Unicode 4.0, am I right?
In case you want to test
your GB18030 font, you can use Netscape 7 (or lateset Mozilla) and then
visit my GB18030 test pages at
http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=10
It should be page to page compatable to the paper copy of GB18030-2000
standard. I also
Kenneth Whistler wrote on 4/22/2004, 3:26 PM:
Frank asked:
I expect there are no difference from Unicode 4.0, am I right?
Correct. Please see Appendix C of Unicode 4.0, p. 1348 and p. 1350,
which already explicitly makes this statement.
--Ken
I don't see ISO10646-2003 in the
are you talking about
http://www.unicode.org/charts/unihangridindex.html
and
http://www.unicode.org/charts/unihanrsindex.html
?
Gary P. Grosso wrote on 4/14/2004, 1:18 PM:
Hi,
I am looking for an up-to-date, online version of the sort of thing
I see in the back of the printed Unicode
Be careful here, for Unicode support in the browser (at least
Netscape/Mozilla) there are some code fork between 2000/XP and Win98/ME.
Philippe Verdy wrote on 3/23/2004, 5:39 AM:
From: Edward H. Trager [EMAIL PROTECTED]
Also, I would not bother testing Windows OSes prior to Windows
Chris Jacobs wrote on 3/15/2004, 10:08 PM:
- Original Message -
From: Kenneth Whistler [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, March 16, 2004 2:28 AM
Subject: Re: in the NEW YORK TIMES today, report of a USA patent for a
met
hod to
May be I should file an US patent application to write Arabic from left
to right to make it more simplified :) I guess that will have more
adoption rate compare to this font design patent since most software
which does not support Bidi already implement them. :)
Mark E. Shoulson wrote on
Wow.
It seems not a very new idea. Similar idea have been used in Chinese 40
years ago and create the differences between Simplifed Chinese And
Traditional Chinese.
Michael Everson wrote on 3/15/2004, 12:40 PM:
In the NEW YORK TIMES today
comes a report of a USA patent for a new version of
many different reason you will see ? there.
read my paper http://people.netscape.com/ftang/paper/unicode25/a302.htm
to see a list.
Manga wrote on 3/15/2004, 10:07 AM:
I use UTF-8 encoding in java code to store multi byte characters in the
db . When i retreive the multi byte characters
Mike Ayers wrote on 3/15/2004, 2:50 PM:
From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Frank
Yung-Fong Tang
Sent: Monday,
March 15, 2004 11:16 AM
It seems not a very
new idea. Similar idea have been used in
Chinese 40
years ago
Not sure how to find the information paper. But one way to check the
degree of the support is to do a GetStringTypeEx agasinst some
characters defined in 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 to see does those
return result reflect what it should be.
Antoine Leca wrote on 3/5/2004, 8:35 AM:
Hi
you can also use 'nsconv' which come with mozilla source code with GB18030.
see http://www.mozilla.org/projects/l10n/mlp_tools.html for details
Zhang Weiwu wrote on 3/5/2004, 6:43 AM:
Hello. I believe this must be a frequent question, but I googled around
and I didn't find a satisfying
BDF is also widly used,
although the quality and features is not that powerful these day.
Also, there are other "standard" about the font:
1. Glyph set "standard"- how to make sure one font contains all the
glyph for a particular group of users- for example- WGL4 is a glyph set
standard from
oh. This is the first time I hear about this. Thanks about your
information. Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__
is defined? or does it only mean wchar_t hold the character in ISO_10646
(which mean it could be 2 bytes, 4 bytes or more than that?)
Noah Levitt wrote on
not prevent someone to make it 16
bits or 64 bits when that macro is defined, right?
And what does the year and month mean?
On Mar 03, 2004, at 12:38, Frank Yung-Fong Tang wrote:
oh. This is the first time I hear about this. Thanks about your
information. Does it also mean wchar_t is 4
Clark Cox wrote on 3/3/2004, 4:33 PM:
[I swap the reply order to make my new question clearer]
And what does the year and month mean?
It indicates which version of ISO10646 is used by the implementation.
In the above example, it indicates whatever version was in effect in
December
I
Rick Cameron wrote on 3/1/2004, 2:13 PM:
Hi, all
This may be an FAQ,
but I couldn't find the answer on unicode.org.
The reason is there are
"NO answer" to the question you ask.
It seems that most
flavours of
unix define wchar_t to be 4 bytes.
Depend on which UNIX
John Cowan wrote:
steve scripsit:
Could someone please clarify the difference between UTF8 and UFT16
please? If it is possible to encode everything in UTF8 and it is more
efficient what is the need for UTF16?
It is more efficient to PROCESS in UTF16.
joe wrote:
(Hmm, in Russian mother language (maternij jazik) means something
*verry* different.
Watch your language! ;-)
He write this in English not Russian, right?
How can I watch Chinese (my language) ?
Joe
As a native Chinese person. I believe
1. The so called eight basic stroke is very standard in concept.
But that is only 8.
2. They list 8 different varients for each of the 8 basic stroke. But
if you read that page carefully, it does not mean that there are only 8
variants for each stroke,
Yes, TEC. look at developer.apple.com and look at Text Encoding Converter
Paramdeep Ahuja wrote:
Hi
Can anyone tell if there is any API available on MAC to convert from
UTF-8
to UTF-16
thnx
-P
Consider CR and LF too.
Mark Davis wrote on 1/14/2004, 9:25 AM:
I'm not sure which one suggested heuristic method you are referring
to, but
you are bounding to conclusions. For example, one of the heuristics is
to judge
what are more common characters when bytes are interpreted as if
Does Thai use CR and LF?
Peter Kirk wrote on 1/14/2004, 8:12 AM:
On 14/01/2004 07:16, John Burger wrote:
...
By the way, I still don't quite understand what's special about Thai.
Could someone elaborate?
I mentioned Thai because it is the only language I know of which does
John Burger wrote on 1/14/2004, 7:16 AM:
Mark E. Shoulson wrote:
If it's a heuristic we're after, then why split hairs and try to make
all the rules ourselves? Get a big ol' mess of training data in as
many languages as you can and hand it over to a class full of CS
graduate
looks like an old idea people in Taiwan gave up long time
ago because of the issue of the quality of glyph will never be
good enough.
Tom Emerson wrote on 1/2/2004, 6:06 PM:
The following paper, Chinese Character Synthesis using METAPOST, was
recently mentioned in a thread on the teTeX
come on, take my joke. but that is a perfect example of language
specific variant glyph, right?
Michael Everson wrote:
At 17:13 -0800 2003-12-02, Frank Yung-Fong Tang wrote:
come on, use language specific glyph substution on the last resort
font to show Irish last resort glyph
Peter Kirk wrote:
On 02/12/2003 16:25, Frank Yung-Fong Tang wrote:
...
a barrier to proper internationalisation ?
My opinion is reverse, I think it is a strategy to proper
internationalization. Remember, people can always choose to stay with
ISO-8859-1 only or go to UTF-8
, it will be 1% of efforts for me
to fix it later, right? :)
Michael Everson wrote:
At 15:38 -0800 2003-12-03, Frank Yung-Fong Tang wrote:
I am encouraging QA to test MES-1 with UTF-8 instead of only ISO-8859-1.
I am encouraging product ship with MES-1 support out of the box instead
than 10 scripts ?
I think the value is it show poeple it is not a ? ASCII
question mark itself.
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
Subject: Re: MS Windows and Unicode 4.0 ?
I'm interested in knowing whether the following features
would soon be found
in Windows : fonts for scripts covered by Unicode 4.0,
corresponding
rendering engine to display all Unicode 4.0 scripts
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntin
-8
gzip of SCSU
gzip of BOCU-1
gzip of Legacy encoding
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John Jenkins wrote:
On Dec 1, 2003, at 4:24 PM, Frank Yung-Fong Tang wrote:
John What 'cmap' format Apple use in the MacOS X
Devanagari and Bangla fonts?
The formats are irrelevant; the Mac supports all the 'cmap' subtable
formats for all subtables. For rendering complex
Michael Everson wrote:
At 14:23 -0800 2003-12-02, Frank Yung-Fong Tang wrote:
It's better than not knowing what range the thing is in. It helps
the
user know he has received, say, Telugu data or whatever.
Only if the user know what Telugu may look like. How many users other
Doug Ewell wrote:
Frank Yung-Fong Tang ytang0648 at aol dot com wrote:
Then, Frank, the Tcl implementation is *not valid UTF-8* and needs to be
fixed. Plain and simple. If a system like Tcl only supports the BMP,
that is its choice, but it *must not* accept non-shortest UTF-8 forms
Peter Kirk wrote:
On 02/12/2003 14:19, Frank Yung-Fong Tang wrote:
A better approach than asking Does product X support Unicode 4.0
which in some way you can always get a NO answer is to
1. Define a smaller set of functionality (Such as MES-1, MES-2, MES-3A)
2. Ask 'Does
://homepage..mac.com/jhjenkins/
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
Philippe Verdy wrote:
Frank Yung-Fong Tang writes:
But how about the UTF-16 vs UCS4 battle?
Forget it: nearly nobody uses UCS-4 except very internally for string
processing at the character level. For whole strings, nearly everybody
uses
UTF-16 as it performs better with less
NT\CurrentVersion\LanguagePack]
SURROGATE=dword:0002
[HKEY_CURRENT_USER\Software\Microsoft\Internet
Explorer\International\Scripts\42]
IEFixedFontName=Code2001
IEPropFontName=Code2001
/code
Andrew
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
rendering, it cannot support them.
John H. Jenkins
John What 'cmap' format Apple use in the MacOS X
Devanagari and Bangla fonts?
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
should also compare the
same for
things like keyword searches and file systems even though it is
technically
incorrect.
Carl
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
the questioning party is thinking must be given as a
part of said question.
oh... really, what kind of Unicode support in Windows 2.0? (since you
said- *any*)... No... I don't really care. Don't try to answer me.
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta
with this
weired specification - ISCII. (if you don't think it is weired, look
at the E-1 Display Attributes session in Annex-E of ISCII which is worst
than the E-2 Font Attributes I mentioned here.)
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto
:
Frank Yung-Fong Tang wrote,
If you visit
http://people.netscape.com/ftang/testscript/gb18030/gb18030.cgi?page=596
and your machine have surrogate support install correctly and
surrogate
font install correctly then you should see surrogate characters
show up
match the gif
Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John 3:16 For God so loved the world that he gave his one and only Son,
that whoever believes in him shall not perish but have eternal life.
Does your
Michael (michka) Kaplan wrote:
From: Frank Yung-Fong Tang [EMAIL PROTECTED]
so.. in summary, how is your concusion about the quality of GB18030
support on IE6/Win2K ? If you run the same test on Mozilla / Netscape
7.0, what is your conclusion about that quality of support
.
If you still think adding 4 bytes UTF-8 support is 1% of the task,
then please join the Tcl project and help me fix that. I appreciate your
efforts there and I beleive a lot of people will thank for your
contribution.
Doug Ewell wrote:
Frank Yung-Fong Tang YTang0648 at aol dot com wrote
.
_
Charla con tus amigos en lnea mediante MSN Messenger.
http://messenger.microsoft.com/es
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John 3:16 For God so loved the world
bandied about a lot.
It is a short hand for "Irn " because it is too hard for most of the people to type the "r" part. :) [and if your software can
save that string retrive it correct later, 50% of the i18n problem is
addressed]
--
Frank Yung-Fong Tang
about fonts.
Could someone recommend a good tutorial or 'font creator' application
that addresses surrogate pairs?
Thanks,
Erik Ostermueller
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg
are you using Netscape7 / Mozilla or IE?
If you use IE, then IE may have a bug about that.
I think Mozilla should not have the problem since I develope and test it
by myself.
[EMAIL PROTECTED] wrote:
.
Frank Yung-Fong Tang wrote,
If you visit
http://people.netscape.com/ftang
Philippe Verdy wrote:
From: Frank Yung-Fong Tang [EMAIL PROTECTED]
It is not that easy for you from don't know beans about fonts to
creat a test font that contains ... \u20050. If you are lucky, it
will
take you several month if not year. There are commercial base font
tool
# ftxinstalledfonts
# ftxruler
# ftxvalidator
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/
--
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo
hum a very stupid (but work) way.
1. use vi
2. type #x + the Unicode text + ; for each characters
3. save it as .html
4. open the file by using browser
5. copy the text
6. paste into your software.
--
Frank Yung-Fong Tang
tm rhtt, Itrntinl Dvlpmet, AOL Intrtv
Srvies
AIM:yungfongta mailto
59 matches
Mail list logo