Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread John Hudson
At 23:02 08/07/2003, Jony Rosenne wrote: I mean "see" in the literal sense. I see an orphaned Hiriq squeezed between the Lamed and the Mem. I see an orphaned hiriq carefully positioned relative to the lamed and mem. If it were simply a question of sitting a secondary vowel on the line where the

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Jony Rosenne
I mean "see" in the literal sense. I see an orphaned Hiriq squeezed between the Lamed and the Mem. Similarly for the other examples given, both Biblical and modern. Jony > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk > Sent: Tuesday,

Re: Deprecated vs. strongly discouraged?

2003-07-08 Thread Doug Ewell
Ted Hopp wrote: > From the Unicode Glossary: > > "Deprecated. A coded character whose use is strongly discouraged." > > From http://www.unicode.org/versions/Unicode4.0.0/, section on Unicode > Character Database: > > "Deprecated Characters. Two Khmer characters, U+17A3 khmer independent > vowel q

Re: Chinese language support for Unicode

2003-07-08 Thread Richard Cook
Sourav, You wrote: > > Hi All, > > Does Unicode support both Simplified as well as Traditional Chinese ? > Yes, it does, though the Simplified support is rather lacking in comparison with the Traditional, since the Traditional characterset is rather large, if not completely open-ended, and simp

Re: Chinese language support for Unicode

2003-07-08 Thread Doug Ewell
souravm wrote: > Does Unicode support both Simplified as well as Traditional Chinese ? Yes. > If it supports then could you please let me know what are the > respective character blocks in Unicode support these two? Han characters in Unicode aren't arbitrarily divided into "simplified" and "tr

RE: When is a character a currency sign?

2003-07-08 Thread Asmus Freytag
Unicode assigns the general category value, "Sk", or "Symbol, [k]urrency" to all characters whose *primary* function is to act as a currency symbol. That excludes all characters that have other, unrelated uses, as long as those are not more specialized than the use as currency sign. That's an u

Chinese language support for Unicode

2003-07-08 Thread souravm
Hi All, Does Unicode support both Simplified as well as Traditional Chinese ? If it supports then could you please let me know what are the respective character blocks in Unicode support these two? Thanks in advance. Regards, Sourav

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 10:37, Ted Hopp wrote: On 08/07/2003 13:01, Peter Kirk wrote: [regarding Haralambous] ... can you remind us of the reference and if possible the URL? "Typesetting the Holy Bible in Hebrew, with TEX" Yannis Haralambous EuroTEX Proceedings 1994 TUGboat 15(3):174-191, September

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 12:56, Philippe Verdy wrote: Suppose your character PATAH-HIRIQ is accepted, and is defined as being canonically equivalent to PATAH-HIRIQ. Then the definition of canonical equivalence with all Unicode algorithm would allow any of these algorithm to decompose it to NFD as a pair of c

Deprecated vs. strongly discouraged?

2003-07-08 Thread Ted Hopp
>From the Unicode Glossary: "Deprecated. A coded character whose use is strongly discouraged." >From http://www.unicode.org/versions/Unicode4.0.0/, section on Unicode Character Database: "Deprecated Characters. Two Khmer characters, U+17A3 khmer independent vowel qaq and U+17D3 khmer sign batha

Re: French group separators, was Re: The character for 10**24 inJapanesenumbers (jo)

2003-07-08 Thread Jim Allan
Philippe Verdy posted: And U+2007 is certainly a better space to use after an sentence-ending dot or exclamation/interrogation point, for typesetting usage or in HTML and XML documents when a large space is intended by the author. Not quite. Remember, U+2007 is a non-breaking space. Use at the

Re: French group separators, was Re: The character for 10**24 i nJapanesenumbers (jo)

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 8:13 PM, Jim Allan <[EMAIL PROTECTED]> wrote: > François Yergeau posted: > > > Jim Allan wrote: > > > U+202F which is always a wide space would be generally less > > > desireable than ordinary non-breaking U+00A0. > > > > Didn't you confuse U+2007 and U+202F here? U+202

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 8:21 PM, Peter Kirk <[EMAIL PROTECTED]> wrote: > On 08/07/2003 11:10, Philippe Verdy wrote: > > > Admit that your proposal of using a canonical decomposition would > > still cause problems with all Unicode algorithms, and with XML > > processing. > > > > Only a NFKD dec

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable
Ted Hopp wrote on 07/08/2003 11:26:14 AM: > Also, there are missing letters and there are missing letters. There are > cases of a single text (e.g., Holzhausen Bible of 1889, Lowe and Brydone > Bible of 1948, as documented by Yannis Haralambous) where the "missing > letters" in some words are simp

Re: 24th Unicode Conference - Atlanta, Georgia USA

2003-07-08 Thread Tex Texin
Anto'nio Martins-Tuva'lkin wrote: > > Twenty-fourth Internationalization and Unicode Conference (IUC24) > > Unicode, Internationalization, the Web: Powering Global Business > > > > http://www.unicode.org/iuc/iuc24 > > September 3-5, 2003 > >

RE: French group separators, was Re: The character for 10**24 inJapanesenumbers (jo)

2003-07-08 Thread Jim Allan
François Yergeau posted: Jim Allan wrote: U+202F which is always a wide space would be generally less desireable than ordinary non-breaking U+00A0. Didn't you confuse U+2007 and U+202F here? U+202F is the *NARROW* NBSP. Yes. I certainly did pasted in the wrong Unicode value. It is U+2007 which w

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 11:10, Philippe Verdy wrote: Admit that your proposal of using a canonical decomposition would still cause problems with all Unicode algorithms, and with XML processing. Only a NFKD decomposition would make your proposed "ligature" character workable for XML processing and Unicode al

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 6:48 PM, Peter Kirk <[EMAIL PROTECTED]> wrote: > On 08/07/2003 09:16, Philippe Verdy wrote: > > > Even if listed in the Canonical Composition Exclusion list, this > > would not work: this list only refers to characters that are > > canonically decomposable into a charact

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Ted Hopp
On 08/07/2003 13:01, Peter Kirk wrote: > [regarding Haralambous] ... can you remind us of the > reference and if possible the URL? "Typesetting the Holy Bible in Hebrew, with TEX" Yannis Haralambous EuroTEX Proceedings 1994 TUGboat 15(3):174-191, September, 1994 I've found it on-line at: http://

RE: When is a character a currency sign?

2003-07-08 Thread Kurosaka, Teruhiko
> But what does one do for a script like Han characters where > those tests > don't apply? e.g., in Chinese, U+938A is used for 'pound'--is that a > word, or a currency sign? U+5713 or U+5143 for 'yuan'? Etc. Are they invented exclusively for the purpose of expressing the currencies? I am not

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable
Peter Kirk wrote on 07/08/2003 08:18:33 AM: > A couple of off list comments have made it clear to me that this > proposal needs some clarification and adjustment... > The solution for this sequence is as follows: Define a new combining > character something like HEBREW LIGATURE PATAH HIRIQ with

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 09:16, Philippe Verdy wrote: Even if listed in the Canonical Composition Exclusion list, this would not work: this list only refers to characters that are canonically decomposable into a character pair, and that MUST be decomposed and MUST NOT be recomposed when creating *either* a N

Re: [OT] When is a character a currency sign?

2003-07-08 Thread Pim Blokland
Thomas Chan schreef: > Would "Euro" also be a (four-character) currency sign? No, that's not a sign, just a name, like "Dollar" or "Pfennig" or "Rijksdaalder". The original question was about characters, though. I saw nobody answered the question with "when it has a general category of Sc". Am I

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 09:26, Ted Hopp wrote: Also, there are missing letters and there are missing letters. There are cases of a single text (e.g., Holzhausen Bible of 1889, Lowe and Brydone Bible of 1948, as documented by Yannis Haralambous) where the "missing letters" in some words are simply not presen

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable
Peter Kirk wrote on 07/08/2003 04:23:59 AM: > Would it work to define a new character, for example, for patah-hiriq > which has a canonical decomposition into patah plus hiriq, or even into > hiriq plus patah? No, because any Unicode normalization form would decompose this, and then apply cano

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
> > And cannot in the first few characters (legally), since these must be > > " > Wrong: the XML declaration is NOT mandatory, only recommanded. > So a XML document can directly start with its actual content > which may be whitespaces, a XML comment (starting by "

Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread John Cowan
Jony Rosenne scripsit: > Just a reminder that the statement of the problem has not been agreed to. I > don't see a vowel sequence in Yerushala(y)im. A vowel sequence is just what you do *see* when you look at the text. You may *infer* the presence of a consonant, between them, but you don't *s

Re: UTF-8 to UTF-16LE

2003-07-08 Thread John Cowan
Philippe Verdy scripsit: > Not bogous: the HTTP header is less important than an explicit > declaration in the XML document. You've misread me or RFC 3023 or both. The charset parameter in the MIME header *overrides* the encoding declaration in the XML content. If the header says "ISO 8859-1",

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Ted Hopp
On 08/07/2003 3:19, Jony Rosenne wrote: > Just a reminder that the statement of the problem has not been agreed to. I > don't see a vowel sequence in Yerushala(y)im. Jony, even if you don't accept the problem as regards Yerushala(y)im, you must accept that modern Hebrew typography can use more t

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 5:14 PM, John Cowan <[EMAIL PROTECTED]> wrote: > Peter Kirk scripsit: > Such a character could only be encoded if it were put into the list > of composition exceptions, because it would upset the stability of > normalization. Even if listed in the Canonical Composition Ex

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Philippe Verdy
> And cannot in the first few characters (legally), since these must be > "

Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Karljürgen Feuerherm
- Original Message - Jony Rosenne wrote on Tuesday, July 08, 2003 at 11:38 AM > Just a reminder that the statement of the problem has not been agreed to. I > don't see a vowel sequence in Yerushala(y)im. There is as far as the text--as it is written--is concerned. Just not in the impli

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread John Cowan
Peter Kirk scripsit: > The solution for this sequence is as follows: Define a new combining > character something like HEBREW LIGATURE PATAH HIRIQ with a canonical > decomposition of hiriq - patah (yes, that way round) and a glyph with a > hiriq to the left of a patah. How does this help? Well,

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 08:14, John Cowan wrote: Such a character could only be encoded if it were put into the list of composition exceptions, because it would upset the stability of normalization. ... OK, I understand. So what if it is listed as a composition exception? In that case the NFC as well a

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
> On Tuesday, July 08, 2003 2:22 PM, Jon Hanna <[EMAIL PROTECTED]> wrote: > > > According > > > to XML the > > > default encoding scheme is UTF-8. > > > > Not strictly true. The default encoding scheme's is UTF-8 *or* > > UTF-16LE *or* UTF-16BE, > > Wrong also: UTF-16LE and UTF16-BE are not in the

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 4:17 PM, John Cowan <[EMAIL PROTECTED]> wrote: > XML parsers MUST support UTF-16, with a BOM and in either order, and > UTF-8. All other encodings MUST be properly declared. > (Bogusly IMHO, an HTTP Content-Type: header overrides this rule.) Not bogous: the HTTP header is

Re: UTF-8 to UTF-16LE

2003-07-08 Thread John Cowan
Francois Yergeau scripsit: > John Cowan wrote: > > (Bogusly IMHO, an HTTP Content-Type: header overrides this rule.) > > There seems to be more and more agreement against this bogosity. Perhaps > more than idle chatter is in order, but I'm not sure where to start... [EMAIL PROTECTED] would seem

Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 08:38, Jony Rosenne wrote: Just a reminder that the statement of the problem has not been agreed to. I don't see a vowel sequence in Yerushala(y)im. Jony I take your point. But I think it depends quite what you mean be "see". If you mean "understand", or "hear", you are quite c

Re: RE: UTF-8 to UTF-16LE

2003-07-08 Thread Rick McGowan
> > Can anyone tell me how to convert UTF-8 to UTF-16LE . > > Funnily enough that's just what I'm coding right now. > The encodings are described in Chapter 3 or Unicode, UTF-8 is also described > RFC 2279 and UTF-16 in RFC 2781 >

Re: UTF-8 to UTF-16LE

2003-07-08 Thread John Cowan
Philippe Verdy scripsit: > - UTF-32: with a recommanded byte order mark (00,00,FE,FF or FF,FE,00,00) UTF-32 requires an XML declaration (always assuming there is no MIME header in scope), even though it is easy to autodetect. > With UTF16-BE, UTF16-LE, UTF-32BE, UTF-32LE, the encoding scheme can

RE: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Jony Rosenne
Just a reminder that the statement of the problem has not been agreed to. I don't see a vowel sequence in Yerushala(y)im. Jony > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk > Sent: Tuesday, July 08, 2003 3:19 PM > To: [EMAIL PROTECTED

RE: French group separators, was Re: The character for 10**24 inJapanesenumbers (jo)

2003-07-08 Thread Francois Yergeau
Jim Allan wrote: > U+202F which is always a wide space would be generally less > desireable than ordinary non-breaking U+00A0. Didn't you confuse U+2007 and U+202F here? U+202F is the *NARROW* NBSP. -- François Yergeau

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Francois Yergeau
John Cowan wrote: > (Bogusly IMHO, an HTTP Content-Type: header overrides this rule.) There seems to be more and more agreement against this bogosity. Perhaps more than idle chatter is in order, but I'm not sure where to start... -- François Yergeau

Re: UTF-8 to UTF-16LE

2003-07-08 Thread John Cowan
Jon Hanna scripsit: > Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or* > UTF-16BE, it's trivial to tell which of these an XML document is in by > looking at the first few bytes, as described in Appendix F of the XML Spec > . Yo

Re: French group separators, was Re: The character for 10**24 inJapanesenumbers(jo)

2003-07-08 Thread Jim Allan
Tex Texin posted on use of U+2007 FIGURE SPACE for digit-grouping space: Right. I was only thinking that if U+202F wasn't available it might be a better choice than NBSP. I checked some common fonts which confirmed what I believed, that digits are normally equal in width to the lowercase letter _

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 2:22 PM, Jon Hanna <[EMAIL PROTECTED]> wrote: > According > > to XML the > > default encoding scheme is UTF-8. > > Not strictly true. The default encoding scheme's is UTF-8 *or* > UTF-16LE *or* UTF-16BE, Wrong also: UTF-16LE and UTF16-BE are not in the default encoding

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 08/07/2003 02:23, Peter Kirk wrote: Would it work to define a new character, for example, for patah-hiriq which has a canonical decomposition into patah plus hiriq, or even into hiriq plus patah? Would normalisation compose a patah-hiriq sequence into this character and so get round the reor

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
According > to XML the > default encoding scheme is UTF-8. Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or* UTF-16BE, it's trivial to tell which of these an XML document is in by looking at the first few bytes, as described in Appendix F of the XML Spec

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 12:49 PM, santhosh kumar <[EMAIL PROTECTED]> wrote: > Hello, > I am new to this group. Now I am working in OBEX profile > design in Windows platform. I have some issues with XML parsing. > According to XML the default encoding scheme is UTF-8. But I want to > conve

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 12:49 PM, santhosh kumar <[EMAIL PROTECTED]> wrote: > Hello, > I am new to this group. Now I am working in OBEX profile > design in Windows platform. I have some issues with XML parsing. > According to XML the default encoding scheme is UTF-8. But I want to > conve

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Dan Kogai
On Tuesday, July 8, 2003, at 07:49 PM, santhosh kumar wrote: Hello, I am new to this group. Now I am working in OBEX profile design in Windows platform. I have some issues with XML parsing. According to XML the default encoding scheme is UTF-8. But I want to convert it in to UTF16-LE. Can

Re: UTF-8 to UTF-16LE

2003-07-08 Thread Venugopala Rao Moram
Santhosh, You can use this command on Unix: iconv -f UTF-8 -t UTF16LE inputfile >outputfile Venu santhosh kumar wrote: Hello, I am new to this group. Now I am working in OBEX profile design in Windows platform. I have some issues with XML parsing. According to XML the defau

Re: French group separators

2003-07-08 Thread Martin JD Green
From: "John Burger" <[EMAIL PROTECTED]> > From: "Philippe Verdy" <[EMAIL PROTECTED]> > > > Unicode already defines with character properties those punctuations > > that terminate sentences. Why would you need to recognize sequences of > > two spaces as meaning an end of sentence??? > > Ambiguity

UTF-8 to UTF-16LE

2003-07-08 Thread santhosh kumar
Hello, I am new to this group. Now I am working in OBEX profile design in Windows platform. I have some issues with XML parsing. According to XML the default encoding scheme is UTF-8. But I want to convert it in to UTF16-LE. Can anyone tell me how to convert UTF-8 to UTF-16LE . Santhosh.

Re: Reading Chinese Characters from a browser

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 11:59 AM, SRIDHARAN Aravind <[EMAIL PROTECTED]> wrote: > How can I differentiate whether a given character in chinese is > simplified or traditional? Normally you can't with Unicode/ISO10646: They are unified now by the UniHan working group, to be used for Traditional

Re: When is a character a currency sign?

2003-07-08 Thread Thomas Chan
On Tue, 8 Jul 2003, Philippe Verdy wrote: > On Tuesday, July 08, 2003 3:35 AM, Thomas Chan <[EMAIL PROTECTED]> wrote: > > On Mon, 7 Jul 2003, Philippe Verdy wrote: > > Would "Euro" also be a (four-character) currency sign? > > Certainly not: this would be a word, whose orthograph varies with > lan

Re: Reading Chinese Characters from a browser

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 10:58 AM, SRIDHARAN Aravind <[EMAIL PROTECTED]> wrote: > Hi, > I have a web application ( using servlets/ jsp's). > In the HTML pages, I enter Chinese characters and when I read them > and display them, they come as junk. > How can I get rid of this problem? > I converte

Re: [OT] When is a character a currency sign?

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 10:58 AM, Alexandros Diamantidis <[EMAIL PROTECTED]> wrote: > * Philippe Verdy <[EMAIL PROTECTED]> [2003-07-08 02:34]: > > With the Euro, a lot of currency units lost their symbol: > > - the Greek Drachme symbol (or is it really only a currency symbol > > or an alternate

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter Kirk
On 07/07/2003 19:23, John Hudson wrote: At 08:51 07/07/2003, Ted Hopp wrote: Editing would also be an "interesting" experience. Could one search for lamed-patah and find it as part of lamed-? Or would the proposal be to use these new codes only as part of bookend processing around normalizati

Reading Chinese Characters from a browser

2003-07-08 Thread SRIDHARAN Aravind
Hi, I have a web application ( using servlets/ jsp's). In the HTML pages, I enter Chinese characters and when I read them and display them, they come as junk. How can I get rid of this problem? I converted those characters into unicode and to my dismay I found that converted unicode values do not

Re: [OT] When is a character a currency sign?

2003-07-08 Thread Alexandros Diamantidis
* Philippe Verdy <[EMAIL PROTECTED]> [2003-07-08 02:34]: > With the Euro, a lot of currency units lost their symbol: > - the Greek Drachme symbol (or is it really only a currency symbol or > an alternate form of the Delta?) I don't think the glyph shown in the Unicode charts (a cursive "Δρ") was v

Re: When is a character a currency sign?

2003-07-08 Thread Philippe Verdy
On Tuesday, July 08, 2003 3:35 AM, Thomas Chan <[EMAIL PROTECTED]> wrote: > On Mon, 7 Jul 2003, Philippe Verdy wrote: > Would "Euro" also be a (four-character) currency sign? Certainly not: this would be a word, whose orthograph varies with language. See the banknotes, where it is written in Greek

RE: French group separators

2003-07-08 Thread jarkko.hietaniemi
> > Don't call me Mr. Roberts is my name. > > > > Don't call me Mr. Roberts is my name. > > In European English Mr is generally not followed by a full stop, > because the abbreviation contains the first and last letter of the > word. (In Finland that would be M:r.) Ummm...? No. Abbreviat