Re: Suggestions in Unicode Indic FAQ

2003-02-05 Thread Doug Ewell
Kent Karlsson kentk at md dot chalmers dot se wrote: Consider English. If I write , that may well be a spell error. Or even Ŋŋŋŋ!, as Michael Everson wrote in WG2 N2306. -Doug Ewell Fullerton, California

Re: Indic Devanagari Query

2003-02-05 Thread Andrew C. West
On Wed, 05 Feb 2003 02:00:30 -0800 (PST), [EMAIL PROTECTED] wrote: If these alternate forms were needed to be displayed in a single multi-lingual plain-text file, wouldn't we need some method of tagging the runs of Latin text for their specific languages? Is this not what the variation

Re: How to convert special characters into unicode?

2003-02-05 Thread Chris Jacobs
- Original Message - From: SRIDHARAN Aravind [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, February 05, 2003 8:27 AM Subject: How to convert special characters into unicode? How to get unicode values for special characters in Java? I have a set of Czech special characters?

Re: list etiquette (was Re: Tailoring of normalization

2003-02-05 Thread Lars Marius Garshol
* [EMAIL PROTECTED] | | Please forgive me and others who are on similar set-ups if this is | all just too much of a pain! It is hard for people to avoid giving others two copies of replies to on-list messages. In my case I've solved this, since my email client (Gnus) detects duplicate messages

Re: How to convert special characters into unicode?

2003-02-05 Thread Doug Ewell
SRIDHARAN Aravind ASridharan at covansys dot com wrote: I have Czech special characters in an excel file. I copy them into Notepad. I save them. Now I use native2ascii convertor that is available with JDK. After I run this utility, I am getting some other unicode values or sometimes only

Re: Indic Devanagari Query

2003-02-05 Thread Peter_Constable
On 02/04/2003 02:52:25 PM jameskass wrote: If these alternate forms were needed to be displayed in a single multi-lingual plain-text file, wouldn't we need some method of tagging the runs of Latin text for their specific languages? The plain-text file would be legible without that -- I don't

discovering code points with embedded nulls

2003-02-05 Thread Erik.Ostermueller
Hello, all. I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls. I'm trying to figure out how much of a liability this is. What is my best plan of attack for discovering precisely which code points have embedded nulls given a particular encoding?

Re: Indic Devanagari Query

2003-02-05 Thread Peter_Constable
On 02/05/2003 04:05:44 AM Andrew C. West wrote: If these alternate forms were needed to be displayed in a single multi-lingual plain-text file, wouldn't we need some method of tagging the runs of Latin text for their specific languages? Is this not what the variation selectors are available

VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread jameskass
. Andrew C. West wrote, Is this not what the variation selectors are available for ? And now that we soon to have 256 of them, perhaps Unicode ought not to be shy about using them for characters other than mathematical symbols. Yes, there seem to be additional variation selectors coming in

VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread jameskass
. Peter Constable wrote, The plain-text file would be legible without that -- I don't think this is an argument in favour of plane 14 tag characters. Preserving culturally-preferred appearance would certainly require markup of some form, whether lang IDs or for font-face and perhaps

RE: discovering code points with embedded nulls

2003-02-05 Thread Rick Cameron
Are you sure the API doesn't support Unicode _characters_ with embedded NULs? Or does it fail to support Unicode _strings_ with embedded NULs? If it really is the former, no character in UTF-8 (except, of course, U+) will include a NUL byte. In UTF-16, it will be any character of the form

RE: discovering code points with embedded nulls

2003-02-05 Thread Marco Cimarosti
Erik Ostermueller wrote: I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls. I'm trying to figure out how much of a liability this is. If by embedded nulls they mean bytes of value zero, that library can *only* work with UTF-8. The other two UTF's

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread Asmus Freytag
At 06:24 PM 2/5/03 +, [EMAIL PROTECTED] wrote: The advantages of using P14 tags (...equals lang IDs mark-up) is that runs of text could be tagged *in a standard fashion* and preserved in plain-text. The minute you have scoped tagging, you are no longer using plain text. The P14 tags are no

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread Peter_Constable
On 02/05/2003 12:24:39 PM jameskass wrote: The advantages of using P14 tags (...equals lang IDs mark-up) is that runs of text could be tagged *in a standard fashion* and preserved in plain-text. Sure, but why do we want to place so much demand on plain text when the vast majority of content we

RE: discovering code points with embedded nulls

2003-02-05 Thread Kenneth Whistler
Erik followed up: From what I'm hearing from you all is that a null in UTF-8 is for termination and termination only. Is this correct? Not quite. A null byte (0x00) in UTF-8 is only a representation of the NULL character (U+). It can be present in UTF-8 for whatever purposes one might

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread Michael Everson
At 16:47 -0500 2003-02-05, Jim Allan wrote: There are often conflicting orthographic usages within a language. Language tagging alone does not indicate whether German text is to be rendered in Roman or Fraktur, whether Gaelic text is to be rendered in Roman or Uncial, and if Uncial, a modern

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread jameskass
. Asmus Freytag wrote, Variation selectors also can be ignored based on their code point values, but unlike p14 tags, they don't become invalid when text is cutpaste from the middle of a string. Excellent point. Unicode 4.0 will be quite specific: P14 tags are reserved for use with

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-05 Thread jameskass
. Peter Constable wrote, Sure, but why do we want to place so much demand on plain text when the vast majority of content we interchange is in some form of marked-up or rich text? Let's let plain text be that -- plain -- and look to the markup conventions that we've invested so much in and

Re: list etiquette (was Re: Tailoring of normalization

2003-02-05 Thread Tex Texin
I don't know about others, but my filters place messages in different folders, EXCEPT when my name is on the cc or to list. In that case, the message is left in my inbox for more immediate review and possible response. The Unicode lists are also slow to send mail so there can be a significant

Re: How to convert special characters into unicode?

2003-02-05 Thread Chris Jacobs
 You mean like this? The following is two times the zodiac [ U+2648 ... U+2653 ] Mortbats Zodiac: 1234567890-= [ Needs Mortbats font to display, http://www.dingbatpages.com] Unicode Zodiac: ♈♉♊♋♌♍♎♏♐♑♒♓ [ Needs e.g. Arial Unicode MS to display ] The upper of these two zodiacs will give wrong