Re: French encoding [Was: Chapter on character sets]

2000-06-22 Thread John Wilcock
On Thu, 15 Jun 2000 10:32:39 -0800 (GMT-0800), Alain LaBonté  wrote: > EBCDIC can't support more than 191 graphic characters and therefore can't > be extended to support MS-1252 character in which most French and Finnish > PC data is encoded. This data needs to be interchanged with other platf

Re: Bengali: variants of same conjunct

2000-06-22 Thread Antoine Leca
[EMAIL PROTECTED] wrote: > > On 06/18/2000 03:12:13 AM <[EMAIL PROTECTED]> wrote: > > > Unless Michael Everson's idea of variant selector characters is taken > > up, there is probably no way to specify this sort of thing in Unicode > > without the use of additional mark-up. > > The use of varia

Re: Bengali: variants of same conjunct

2000-06-22 Thread Antoine Leca
Michael Kaplan wrote: > > Thus far it is something that has been implemented in the fonts, rather than > anywhere else for example there are several ligatures in Tamil that will > display one way with the Latha font and the other way with Monotype Tamil > Arial (the way set out in Unicode 3.0

Re: Case mapping errors?

2000-06-22 Thread Antoine Leca
John O'Conner wrote: > > The most difficult cases are 2126, 212A, and 212B. These characters are > "letter-like" in their glyph appearance, but it seems that their actual > semantics are not. It seems like someone may have looked at KELVIN SIGN > for example, decided it looked like a Latin-1 'K'

RE: Case mapping errors?

2000-06-22 Thread Karlsson Kent - keka
(This message is send in UTF-8. Flames regarding that fact will be deleted without response.) No, those case mappings are not in error. Nor are their canonical mappings in error. (The MICRO SIGN would have had a canonical mapping to Greek mu, if it had not been included in such much-used repe

Chinese characters in Java Applet

2000-06-22 Thread Parvinder Singh(EHPT)
Title: Chinese characters in Java Applet Hello, I am trying to to display chinese characters stored in Unicode format in oracle database through a Java applet in the browser. The applet uses JDBC calls and thin driver. The oracle resides on Sun Solaris server . But the applet is not showin

Re: UTF-8N?

2000-06-22 Thread Antoine Leca
John Cowan wrote: > > Now suppose we have a character sequence beginning with U+FEFF U+0020. > This would be encoded as follows: > > US-ASCII: (not possible) > UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ... > UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ... > UTF-16BE: 0xFE 0xFF 0x00 0x20 ... > UTF-16LE

RE: Bengali: variants of same conjunct

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
> > Thus since people who write the language sent both, > > > Do you mean that Tamil writers *purposely* use both the "ancient" and the > "modern" forms in the same document? > What is the intent? > yes, that is what am I saying. If you go to several of the Tamil resource sites on the web, you

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 03:09:43 PM <[EMAIL PROTECTED]> wrote: >Appropriate or not, users (you know, those people who don't read the >documentation that the programmers don't write) will use text editors to split >files. They will then concatenate the files using a non-Unicode aware tool. >And they wi

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote: >It was my understanding that U+FEFF when received as first character should be >seen as BOM and not as a character, and handled accordingly. When the encoding scheme is known to be UTF-16BE or UTF-16LE, it *must not* be interpreted as a BO

Re: Case mapping errors?

2000-06-22 Thread Mark Davis
These characters are purely coded for compatibility. Unicode does not distinguish letters by the abbreviations that they happen to be used in. There is no difference in semantics between the "g" in "go" vs. the "g" in "12g", nor between the "Å" in "Århus" vs. the "Å" in "15Å", nor -- for that m

Re: UTF-8N?

2000-06-22 Thread Christopher John Fynn
[EMAIL PROTECTED] wrote: > ... I think the suggestion that BOM and ZWNBSP be > de-unified, which I have heard before, may make the best sense. *If* that's the solution, it should be done yesterday. The longer it takes the more implementations (and data) there will be that needs to be changed.

Re: UTF-8N?

2000-06-22 Thread Antoine Leca
[EMAIL PROTECTED] wrote: > > On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote: > > >It was my understanding that U+FEFF when received as first character > should be > >seen as BOM and not as a character, and handled accordingly. > > When the encoding scheme is known to be UTF-16BE or UTF-16L

Re: Bengali: variants of same conjunct

2000-06-22 Thread Arijit Upadhyay
>From the readings of the thread since yesterday I find that this an issue as yet unresolved. BUt perhaps Abdul's and John Hudson's advise that I could try upon. > recommend using a Stylistic Alternate feature in an OpenType font, (john) >Ka Virama Ya -> Ko zophola >Ka Virama YYa -> KoZophola l

UTF-8 BOM Nonsense

2000-06-22 Thread Gary L. Wade
Please! After hundreds of e-mails on this topic, let it die! The BOM is only useful with UTF-16 or UCS-4 characters. There is no such thing as byte ordering when each character is a byte or a multibyte sequence with a well-documented ordering denoting how to interpret this! For further referen

Re: Chinese characters in Java Applet

2000-06-22 Thread Valeriy E. Ushakov
On Thu, Jun 22, 2000 at 02:20:39 -0800, Parvinder Singh(EHPT) wrote: > I am trying to to display chinese characters stored in Unicode format in > oracle database through a Java applet in the browser. The applet uses JDBC > calls and thin driver. > The oracle resides on Sun Solaris server . But th

RE: UTF-8N?

2000-06-22 Thread Ayers, Mike
> > On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote: > > >It was my understanding that U+FEFF when received as first character > should be > >seen as BOM and not as a character, and handled accordingly. > > When the encoding scheme is known to be UTF-16BE or UTF-16LE, > it *must not* > be

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-22 Thread Robert A. Rosenberg
At 12:12 PM 06/20/2000 -0800, Kenneth Whistler wrote: >Bob Rosenberg wrote: > > > > > > >This was my concern, there is no way to distinguish UTF-8 from Latin-1 in > > >case of upper ASCII characters here. > > > > Yes there is - its called a "Sanity Check". You parse the file looking for > > High-A

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-22 Thread Karlsson Kent - keka
> -Original Message- > From: Robert A. Rosenberg [mailto:[EMAIL PROTECTED]] ... [on overlong UTF-8 sequences, a few lines down:] > faked) files. I agree that missed the extra sanity check of > looked for > shortest string but if I remember the rules correctly, there is no > requireme

Re: UTF-8N?

2000-06-22 Thread John Cowan
Antoine Leca wrote: > Now I ask a slighty different question then. What is the name of the > encoding where the byte order is known (for example, any application > on an Intel machine that receive its data from the system, as opposed > as from the network or similar hazardous source), and where a

RE: UTF-8 BOM Nonsense

2000-06-22 Thread Karlsson Kent - keka
Well, Gary, if only all were that well. In ISO 10646 view, there is no need for any "BOM", or "signature" as it is called in an informative annex to 10646, at all. UCS-2, UCS-4, and UTF-16, *when* serialised into bytes, all *must* be serialised in big-endian order. That would be the end of stor

Re: UTF-8 BOM Nonsense

2000-06-22 Thread John Cowan
"Gary L. Wade" wrote: > The BOM is only useful with UTF-16 or UCS-4 characters. It's only useful as a mark of byte order. There are others who want to use it as a charset signature, and there is a Well-Known OS that insists on doing so. Attempting to discourage the use of BOMful UTF-8 in interc

Re: UTF-8N?

2000-06-22 Thread John Cowan
"Ayers, Mike" wrote: > Am I reading this wrong? Here's what I get: > > I hand you a UTF-16 document. This document is: > > FE FF 00 48 00 65 00 6C 00 6C 00 6F > > ..so it says "Hello". Then I say, "Oh, by the way, that's > big-endian." *POOF* The content of the doc

Re: UTF-8N?

2000-06-22 Thread Kenneth Whistler
Juliusz wrote: > The problem is not one of broken software. The problem is that, as > John Cowan explained in detail, with the addition of the BOM, UTF-8 > and UTF-16 become ambiguous. This is putting the cart before the horse. The U+FEFF BOM existed in Unicode 1.0, and was carried into ISO/IE

Re: UTF-8N?

2000-06-22 Thread Kenneth Whistler
Chris Fynn wrote: > [EMAIL PROTECTED] wrote: > > > ... I think the suggestion that BOM and ZWNBSP be > > de-unified, which I have heard before, may make the best sense. > > *If* that's the solution, it should be done yesterday. The longer it takes the > more implementations (and data) there wil

Re: UTF-8N?

2000-06-22 Thread John Cowan
Kenneth Whistler wrote: > Now we are pushing through the long, bureaucratic process of getting > this accepted into 10646-1, so it we maintain synchronicity with a > joint publication of it as a *standard* character. So a fair statement of what you hope to achieve is: U+2060 will be the zero-wid

Re: UTF-8N?

2000-06-22 Thread Kenneth Whistler
John Cowan wrote: > Kenneth Whistler wrote: > > > Now we are pushing through the long, bureaucratic process of getting > > this accepted into 10646-1, so it we maintain synchronicity with a > > joint publication of it as a *standard* character. > > So a fair statement of what you hope to achiev

RE: UTF-8 BOM Nonsense

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
I agree Gary. Windows 2000 Notepad, however, does not agree and writes one. Since Notepad in prior versions of Windows was in fact the defacto standard for HTML editor (), clearly it is a program to be reckoned with. People should be aware of the fact that there are going to MANY files out there

Java, SQL, Unicode and Databases

2000-06-22 Thread Tex Texin
I want to write an application in Java that will store information in a database using Unicode. Ideally the application will run with any database that supports Unicode. One would presume that the JDBC driver would take care of any differences between databases so my application could be independe

English as she is spoke

2000-06-22 Thread mark . davis
I got some amusing results when I tried out the Altavista translation service on segments of the new language descriptions in http://www.unicode.org/unicode/standard/WhatIsUnicode.html Original (English): What is Unicode? Unicode provides a unique number for every character, no matter wh

Re: Java, SQL, Unicode and Databases

2000-06-22 Thread Jianping Yang
Tex, Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 as database character set. In this case, all the text datatype in JDBC will support Unicode data. Regards, Jianping. Tex Texin wrote: > I want to write an application in Java that will store information >

Re: Java, SQL, Unicode and Databases

2000-06-22 Thread Kenneth Whistler
Jianping responded: > > Tex, > > Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 >as database > character set. In this case, all the text datatype in JDBC will support Unicode data. > The same thing is, of course, true for Sybase databases using UTF-8 at t

Re: UTF-8N?

2000-06-22 Thread Peter_Constable
On 06/21/2000 06:33:57 PM <[EMAIL PROTECTED]> wrote: >> The standard doesn't ever discuss the BOM in the context of UTF-8, > >See section 13.6 (page 324). Sure enough. Well, there you go: the confusion is officially sanctioned! Peter Constable

Re: Bengali: variants of same conjunct

2000-06-22 Thread Antoine Leca
Michael Kaplan wrote: > > > > Thus since people who write the language sent both, > > > > > > Do you mean that Tamil writers *purposely* use both the "ancient" and the > > "modern" forms in the same document? > > What is the intent? > > > yes, that is what am I saying. Okay, I did not know (and

RE: Bengali: variants of same conjunct

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
>But what is the semantic intent, then? >In other words, what may mean the use of "elephant-trunk" ai vs the "normal" one? >What may mean the use of the rounded naa vs the "normal", two parts, one? I do not know enough about Tamil usage to understand THAT part. :-)