On Thu, 15 Jun 2000 10:32:39 -0800 (GMT-0800), Alain LaBonté wrote:
> EBCDIC can't support more than 191 graphic characters and therefore can't
> be extended to support MS-1252 character in which most French and Finnish
> PC data is encoded. This data needs to be interchanged with other platf
[EMAIL PROTECTED] wrote:
>
> On 06/18/2000 03:12:13 AM <[EMAIL PROTECTED]> wrote:
>
> > Unless Michael Everson's idea of variant selector characters is taken
> > up, there is probably no way to specify this sort of thing in Unicode
> > without the use of additional mark-up.
>
> The use of varia
Michael Kaplan wrote:
>
> Thus far it is something that has been implemented in the fonts, rather than
> anywhere else for example there are several ligatures in Tamil that will
> display one way with the Latha font and the other way with Monotype Tamil
> Arial (the way set out in Unicode 3.0
John O'Conner wrote:
>
> The most difficult cases are 2126, 212A, and 212B. These characters are
> "letter-like" in their glyph appearance, but it seems that their actual
> semantics are not. It seems like someone may have looked at KELVIN SIGN
> for example, decided it looked like a Latin-1 'K'
(This message is send in UTF-8. Flames regarding that fact
will be deleted without response.)
No, those case mappings are not in error. Nor are their
canonical mappings in error. (The MICRO SIGN would have
had a canonical mapping to Greek mu, if it had not been
included in such much-used repe
Title: Chinese characters in Java Applet
Hello,
I am trying to to display chinese characters stored in Unicode format in oracle database through a Java applet in the browser. The applet uses JDBC calls and thin driver.
The oracle resides on Sun Solaris server . But the applet is not showin
John Cowan wrote:
>
> Now suppose we have a character sequence beginning with U+FEFF U+0020.
> This would be encoded as follows:
>
> US-ASCII: (not possible)
> UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ...
> UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ...
> UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
> UTF-16LE
> > Thus since people who write the language sent both,
>
>
> Do you mean that Tamil writers *purposely* use both the "ancient" and the
> "modern" forms in the same document?
> What is the intent?
>
yes, that is what am I saying. If you go to several of the Tamil resource
sites on the web, you
On 06/21/2000 03:09:43 PM <[EMAIL PROTECTED]> wrote:
>Appropriate or not, users (you know, those people who don't read the
>documentation that the programmers don't write) will use text editors to
split
>files. They will then concatenate the files using a non-Unicode aware
tool.
>And they wi
On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote:
>It was my understanding that U+FEFF when received as first character
should be
>seen as BOM and not as a character, and handled accordingly.
When the encoding scheme is known to be UTF-16BE or UTF-16LE, it *must not*
be interpreted as a BO
These characters are purely coded for compatibility. Unicode does not distinguish
letters by the abbreviations that they happen to be used in. There is no difference in
semantics between the "g" in "go" vs. the "g" in "12g", nor between the "Å" in "Århus"
vs. the "Å" in "15Å", nor -- for that m
[EMAIL PROTECTED] wrote:
> ... I think the suggestion that BOM and ZWNBSP be
> de-unified, which I have heard before, may make the best sense.
*If* that's the solution, it should be done yesterday. The longer it takes the
more implementations (and data) there will be that needs to be changed.
[EMAIL PROTECTED] wrote:
>
> On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16L
>From the readings of the thread since yesterday I find that this an issue as
yet unresolved. BUt perhaps Abdul's and John Hudson's advise that I could
try upon.
> recommend using a Stylistic Alternate feature in an OpenType font,
(john)
>Ka Virama Ya -> Ko zophola
>Ka Virama YYa -> KoZophola l
Please!
After hundreds of e-mails on this topic, let it die!
The BOM is only useful with UTF-16 or UCS-4 characters.
There is no such thing as byte ordering when each character is a byte or
a multibyte sequence with a well-documented ordering denoting how to
interpret this! For further referen
On Thu, Jun 22, 2000 at 02:20:39 -0800, Parvinder Singh(EHPT) wrote:
> I am trying to to display chinese characters stored in Unicode format in
> oracle database through a Java applet in the browser. The applet uses JDBC
> calls and thin driver.
> The oracle resides on Sun Solaris server . But th
>
> On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16LE,
> it *must not*
> be
At 12:12 PM 06/20/2000 -0800, Kenneth Whistler wrote:
>Bob Rosenberg wrote:
>
> > >
> > >This was my concern, there is no way to distinguish UTF-8 from Latin-1 in
> > >case of upper ASCII characters here.
> >
> > Yes there is - its called a "Sanity Check". You parse the file looking for
> > High-A
> -Original Message-
> From: Robert A. Rosenberg [mailto:[EMAIL PROTECTED]]
...
[on overlong UTF-8 sequences, a few lines down:]
> faked) files. I agree that missed the extra sanity check of
> looked for
> shortest string but if I remember the rules correctly, there is no
> requireme
Antoine Leca wrote:
> Now I ask a slighty different question then. What is the name of the
> encoding where the byte order is known (for example, any application
> on an Intel machine that receive its data from the system, as opposed
> as from the network or similar hazardous source), and where a
Well, Gary, if only all were that well.
In ISO 10646 view, there is no need for any "BOM", or "signature"
as it is called in an informative annex to 10646, at all. UCS-2,
UCS-4, and UTF-16, *when* serialised into bytes, all *must* be
serialised in big-endian order. That would be the end of stor
"Gary L. Wade" wrote:
> The BOM is only useful with UTF-16 or UCS-4 characters.
It's only useful as a mark of byte order. There are others who want to
use it as a charset signature, and there is a Well-Known OS that insists
on doing so.
Attempting to discourage the use of BOMful UTF-8 in interc
"Ayers, Mike" wrote:
> Am I reading this wrong? Here's what I get:
>
> I hand you a UTF-16 document. This document is:
>
> FE FF 00 48 00 65 00 6C 00 6C 00 6F
>
> ..so it says "Hello". Then I say, "Oh, by the way, that's
> big-endian." *POOF* The content of the doc
Juliusz wrote:
> The problem is not one of broken software. The problem is that, as
> John Cowan explained in detail, with the addition of the BOM, UTF-8
> and UTF-16 become ambiguous.
This is putting the cart before the horse.
The U+FEFF BOM existed in Unicode 1.0, and was carried into ISO/IE
Chris Fynn wrote:
> [EMAIL PROTECTED] wrote:
>
> > ... I think the suggestion that BOM and ZWNBSP be
> > de-unified, which I have heard before, may make the best sense.
>
> *If* that's the solution, it should be done yesterday. The longer it takes the
> more implementations (and data) there wil
Kenneth Whistler wrote:
> Now we are pushing through the long, bureaucratic process of getting
> this accepted into 10646-1, so it we maintain synchronicity with a
> joint publication of it as a *standard* character.
So a fair statement of what you hope to achieve is: U+2060 will be
the zero-wid
John Cowan wrote:
> Kenneth Whistler wrote:
>
> > Now we are pushing through the long, bureaucratic process of getting
> > this accepted into 10646-1, so it we maintain synchronicity with a
> > joint publication of it as a *standard* character.
>
> So a fair statement of what you hope to achiev
I agree Gary.
Windows 2000 Notepad, however, does not agree and writes one.
Since Notepad in prior versions of Windows was in fact the defacto standard
for HTML editor (), clearly it is a program to be reckoned with. People
should be aware of the fact that there are going to MANY files out there
I want to write an application in Java that will store information
in a database using Unicode. Ideally the application will run
with any database that supports Unicode. One would presume that the
JDBC driver would take care of any differences between databases
so my application could be independe
I got some amusing results when I tried out the Altavista translation
service on segments of the new language descriptions in
http://www.unicode.org/unicode/standard/WhatIsUnicode.html
Original (English):
What is Unicode? Unicode provides a unique number for every character,
no matter wh
Tex,
Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8 as
database
character set. In this case, all the text datatype in JDBC will support Unicode data.
Regards,
Jianping.
Tex Texin wrote:
> I want to write an application in Java that will store information
>
Jianping responded:
>
> Tex,
>
> Oracle doesn't have special requirement for datatype in JDBC driver if you use UTF8
>as database
> character set. In this case, all the text datatype in JDBC will support Unicode data.
>
The same thing is, of course, true for Sybase databases using UTF-8
at t
On 06/21/2000 06:33:57 PM <[EMAIL PROTECTED]> wrote:
>> The standard doesn't ever discuss the BOM in the context of UTF-8,
>
>See section 13.6 (page 324).
Sure enough. Well, there you go: the confusion is officially sanctioned!
Peter Constable
Michael Kaplan wrote:
>
> > > Thus since people who write the language sent both,
> >
> >
> > Do you mean that Tamil writers *purposely* use both the "ancient" and the
> > "modern" forms in the same document?
> > What is the intent?
> >
> yes, that is what am I saying.
Okay, I did not know (and
>But what is the semantic intent, then?
>In other words, what may mean the use of "elephant-trunk" ai vs the
"normal" one?
>What may mean the use of the rounded naa vs the "normal", two
parts, one?
I do not know enough about Tamil usage to understand THAT part. :-)
35 matches
Mail list logo