Re: Bengali: variants of same conjunct

2000-06-22 Thread Antoine Leca
Michael Kaplan wrote: Thus far it is something that has been implemented in the fonts, rather than anywhere else for example there are several ligatures in Tamil that will display one way with the Latha font and the other way with Monotype Tamil Arial (the way set out in Unicode 3.0 is

RE: Case mapping errors?

2000-06-22 Thread Karlsson Kent - keka
(This message is send in UTF-8. Flames regarding that fact will be deleted without response.) No, those case mappings are not in error. Nor are their canonical mappings in error. (The MICRO SIGN would have had a canonical mapping to Greek mu, if it had not been included in such much-used

Chinese characters in Java Applet

2000-06-22 Thread Parvinder Singh(EHPT)
Title: Chinese characters in Java Applet Hello, I am trying to to display chinese characters stored in Unicode format in oracle database through a Java applet in the browser. The applet uses JDBC calls and thin driver. The oracle resides on Sun Solaris server . But the applet is not

Re: UTF-8N?

2000-06-22 Thread Antoine Leca
John Cowan wrote: Now suppose we have a character sequence beginning with U+FEFF U+0020. This would be encoded as follows: US-ASCII: (not possible) UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ... UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ... UTF-16BE: 0xFE 0xFF 0x00 0x20 ... UTF-16LE: 0xFF

RE: Bengali: variants of same conjunct

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
Thus since people who write the language sent both, cut Do you mean that Tamil writers *purposely* use both the "ancient" and the "modern" forms in the same document? What is the intent? yes, that is what am I saying. If you go to several of the Tamil resource sites on the web, you can

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 03:09:43 PM [EMAIL PROTECTED] wrote: Appropriate or not, users (you know, those people who don't read the documentation that the programmers don't write) will use text editors to split files. They will then concatenate the files using a non-Unicode aware tool. And they will

Re: Case mapping errors?

2000-06-22 Thread Mark Davis
These characters are purely coded for compatibility. Unicode does not distinguish letters by the abbreviations that they happen to be used in. There is no difference in semantics between the "g" in "go" vs. the "g" in "12g", nor between the "Å" in "Århus" vs. the "Å" in "15Å", nor -- for that

Re: UTF-8N?

2000-06-22 Thread Christopher John Fynn
[EMAIL PROTECTED] wrote: ... I think the suggestion that BOM and ZWNBSP be de-unified, which I have heard before, may make the best sense. *If* that's the solution, it should be done yesterday. The longer it takes the more implementations (and data) there will be that needs to be changed. -

Re: Chinese characters in Java Applet

2000-06-22 Thread Valeriy E. Ushakov
On Thu, Jun 22, 2000 at 02:20:39 -0800, Parvinder Singh(EHPT) wrote: I am trying to to display chinese characters stored in Unicode format in oracle database through a Java applet in the browser. The applet uses JDBC calls and thin driver. The oracle resides on Sun Solaris server . But the

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-22 Thread Robert A. Rosenberg
At 12:12 PM 06/20/2000 -0800, Kenneth Whistler wrote: Bob Rosenberg wrote: This was my concern, there is no way to distinguish UTF-8 from Latin-1 in case of upper ASCII characters here. Yes there is - its called a "Sanity Check". You parse the file looking for High-ASCII. If you

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-22 Thread Karlsson Kent - keka
-Original Message- From: Robert A. Rosenberg [mailto:[EMAIL PROTECTED]] ... [on overlong UTF-8 sequences, a few lines down:] faked) files. I agree that missed the extra sanity check of looked for shortest string but if I remember the rules correctly, there is no requirement

Re: UTF-8N?

2000-06-22 Thread John Cowan
"Ayers, Mike" wrote: Am I reading this wrong? Here's what I get: I hand you a UTF-16 document. This document is: FE FF 00 48 00 65 00 6C 00 6C 00 6F ..so it says "Hello". Then I say, "Oh, by the way, that's big-endian." *POOF* The content of the document

Re: UTF-8N?

2000-06-22 Thread John Cowan
Kenneth Whistler wrote: Now we are pushing through the long, bureaucratic process of getting this accepted into 10646-1, so it we maintain synchronicity with a joint publication of it as a *standard* character. So a fair statement of what you hope to achieve is: U+2060 will be the zero-width

RE: UTF-8 BOM Nonsense

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
I agree Gary. Windows 2000 Notepad, however, does not agree and writes one. Since Notepad in prior versions of Windows was in fact the defacto standard for HTML editor (g), clearly it is a program to be reckoned with. People should be aware of the fact that there are going to MANY files out

Java, SQL, Unicode and Databases

2000-06-22 Thread Tex Texin
I want to write an application in Java that will store information in a database using Unicode. Ideally the application will run with any database that supports Unicode. One would presume that the JDBC driver would take care of any differences between databases so my application could be

English as she is spoke

2000-06-22 Thread mark . davis
I got some amusing results when I tried out the Altavista translation service on segments of the new language descriptions in http://www.unicode.org/unicode/standard/WhatIsUnicode.html Original (English): What is Unicode? Unicode provides a unique number for every character, no matter

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 06:33:57 PM [EMAIL PROTECTED] wrote: The standard doesn't ever discuss the BOM in the context of UTF-8, See section 13.6 (page 324). Sure enough. Well, there you go: the confusion is officially sanctioned! Peter Constable

Re: Bengali: variants of same conjunct

2000-06-22 Thread Antoine Leca
Michael Kaplan wrote: Thus since people who write the language sent both, cut Do you mean that Tamil writers *purposely* use both the "ancient" and the "modern" forms in the same document? What is the intent? yes, that is what am I saying. Okay, I did not know (and I did not