Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 15:07, John Cowan wrote: Not necessarily. A process may check its input for normalization and reject it if it is not normalized, and XML consumers are encouraged (not required) to do so. This looks to me like a clear breach of C9, at least of the derived principle no process ca

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: "Jim Allan" <[EMAIL PROTECTED]> > It seems to me that Cedilla/undercomma folding would be a useful > addition to "Charater Foldings" at http://www.unicode.org/reports/tr30. Excellent idea, however it has to be tailored by language: For example, Turkish and French (which almost always and co

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Peter Kirk scripsit: > [A process] must > interpret a non-normalised variant in the same way as the normalised > form; and it cannot assume that the process presenting the data makes a > distinction between the normalised and non-normalised form and does not > reorder the data into an arbitrar

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: "John Hudson" <[EMAIL PROTECTED]> > All of these fonts already include the newer Romanian S/s and T/t > commaaccent characters and correct accent forms for the Latvian diacritics > (although the Arial comma accent is a bit too much like an unattached cedilla). I meant for Windows 9x/ME users

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 14:14, John Cowan wrote: Peter Kirk scripsit: Is this actually a conformance requirement? I thought I understood the following: A rendering engine which fails to render canonical equivalents identically, or fails to render certain orders sensibly, is not doing what the Unicode

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Peter Kirk scripsit: > Is this actually a conformance requirement? I thought I understood the > following: A rendering engine which fails to render canonical > equivalents identically, or fails to render certain orders sensibly, is > not doing what the Unicode standard tells it that it must do.

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Language Analysis Systems, Inc. Unicode list reader scripsit: > It suggests that for many fonts, > > U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA > > and > > U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE > > would have exactly the same rendering. Some applicatio

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Philippe Verdy
From: "Peter Kirk" <[EMAIL PROTECTED]> > On 29/10/2003 10:46, John Hudson wrote: > > > While we're about it, we could propose a spacing, non-breaking ELIDED > > CHARACTER for use in ketiv/qere where combining marks need to be > > applied to empty space within a word. > > How would this differ from

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 29/10/2003 11:53, John Cowan wrote: ... A rendering engine is *not* entitled to misbehave if it receives cedilla> and try to place the dot between the "a" glyph and the cedilla; this is a direct consequence of the conformance requirement that processes not distinguish (unless they have speci

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Hudson
At 12:33 PM 10/29/2003, Philippe Verdy wrote: Even today, it is quite hard to find any Romanian or Latvian web page using the new Unicode characters with a comma-below: even governmental sites use the characters coded with the cedilla, and they support that this comma below is rendered approximate

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Jim Allan
Rich Gilliam wrote: It suggests that for many fonts, U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA and U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE would have exactly the same rendering. Some applications would need to know this and treat U+0067 U+0327 the same as

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Peter Kirk
On 29/10/2003 10:46, John Hudson wrote: At 10:26 AM 10/29/2003, Philippe Verdy wrote: The problem I see here is that ZWJ is not intended to create ligatures between diacritics, only between clusters that would otherwise still be a single combining sequence. Normally CGJ would have fitted better

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Peter Kirk
On 29/10/2003 10:32, Philippe Verdy wrote: From: "Toyin Ryan" <[EMAIL PROTECTED]> To: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, October 29, 2003 6:11 PM Subject: Re: Hacek - Typing from a keyboard... Help Thank you Philippe. The characters you detailed below

Re: Unicode support for Khmer

2003-10-29 Thread Deborah Goldsmith
On Oct 27, 2003, at 12:30 PM, Sue and Maurice Bauhahn wrote: Nothing of the sort is available on Macintosh (partly because Mac applications that support ATSUI appear to be even more rare;-(). Unicode support in applications is widespread on Mac OS X. In particular, all the built-in applications sh

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Peter Kirk
On 29/10/2003 10:26, Philippe Verdy wrote: ... The problem I see here is that ZWJ is not intended to create ligatures between diacritics, only between clusters that would otherwise still be a single combining sequence. Normally CGJ would have fitted better there, but this conflicts with the inten

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
- Original Message - From: "John Hudson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "'Jim Allan'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, October 29, 2003 6:15 PM Subject: RE: Merging combining classes, was: New contribution N2676 > At 04:04 AM 10/29/2003, Kent Ka

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Cowan
Jim Allan scripsit: > << For example, it is crucial that the combining class of the cedilla be > lower than the combining class of the dot below, although their exact > values of 202 and 220 are not important for implementation. >> > > This is not explained, but obviously the reason why it is "

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Philippe Verdy
From: "Toyin Ryan" <[EMAIL PROTECTED]> > Thank you Philippe. The characters you detailed below only seem to work in Word. > They don't work in DBArtisan or netscape messenger or outlook. There are a lot of applications that can use and render these characters. At least all those applications that

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Philippe Verdy
From: "Jim Allan" <[EMAIL PROTECTED]> > Kent Karlson posted: > > > COMBINING COMMA BELOW is not "attached", even though cedilla is. > > A turned comma above is not _attached_ above... > > Correct. COMBINING COMMA BELOW belongs to combining class 220. > > However by Unicode specifications both it a

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Language Analysis Systems, Inc. Unicode list reader
>However by Unicode specifications both it and an attached lower cedilla >on _g_ may be rendered by unattached turned comma above which interacts >with characters not in their respective combining classes. And this new >turned comma above of necessity would always be applied before normal >uppe

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread John Hudson
At 10:26 AM 10/29/2003, Philippe Verdy wrote: The problem I see here is that ZWJ is not intended to create ligatures between diacritics, only between clusters that would otherwise still be a single combining sequence. Normally CGJ would have fitted better there, but this conflicts with the intent

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread John Hudson
At 10:26 AM 10/29/2003, Philippe Verdy wrote: In the sil.org proposal, the medial meteg is missing, but not the right and left meteg, as they are encoded within the same class and their order is preserved when attached to a vowel. It is not missing, per se. It was presumed that the medial meteg wo

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Philippe Verdy
From: "Toyin Ryan" <[EMAIL PROTECTED]> To: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, October 29, 2003 6:11 PM Subject: Re: Hacek - Typing from a keyboard... Help > Thank you Philippe. The characters you detailed below only seem to work in Word. > They don't wo

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Philippe Verdy
From: "John Hudson" <[EMAIL PROTECTED]> > At 08:17 AM 10/29/2003, Peter Kirk wrote: > > >Normally meteg is positioned below and to the left of any other low > >centred mark. Less frequently it is positioned to the right of a low > >centred mark. But it is always to the left of a low right mark i.e.

Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Philippe Verdy
> Normally meteg is positioned below and to the left of any other low > centred mark. Less frequently it is positioned to the right of a low > centred mark. But it is always to the left of a low right mark i.e. > yetiv or dehi. It can also be centred within a hataf vowel. In > http://www.qaya.org/a

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Michael \(michka\) Kaplan
From: "Toyin Ryan" <[EMAIL PROTECTED]> > Thank you Philippe. The characters you detailed below only seem to work in Word. > They don't work in DBArtisan or netscape messenger or outlook. Do you have > anymore ideas? As I mentioned before, it looks like DBArtisan is not a Unicode application and y

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread John Hudson
At 04:04 AM 10/29/2003, Kent Karlsson wrote: The Latvian "cedillas" are really commas below, and are best encoded so. Still for lowercase g (not for uppercase) the comma below is _rendered_ as a turned comma above. The 'not for uppercase' rule depends on the design of the uppercase letter. Typica

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Rick McGowan
Hello Tony, Number one question: have you verified that DBArtisan 7 actually has support for Unicode? I find nothing at all about Unicode support on QBS Software web site, nor in the Embarcadero white paper, when I look at the features of their products. Rick > However, I want to t

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Eric Muller
Rick McGowan wrote: "Caron" [...] is *NOT* in current use at all in English. It is widely used in the typography community, for better or for worse. Eric.

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Toyin Ryan
Thank you Philippe. The characters you detailed below only seem to work in Word. They don't work in DBArtisan or netscape messenger or outlook. Do you have anymore ideas? Philippe Verdy wrote: > The Unicode English name of the "hacek" character is "caron" (U+030C), and > the "straight line above"

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Rick McGowan
Philippe wrote... > The Unicode English name of the "hacek" character is "caron" (U+030C) Just for the record: Actually, in English, we still call it a hacek. "Caron" is a term apparently invented in an ISO character encoding committee, and is *NOT* in current use at all in English. We call

Re: unicode on Linux

2003-10-29 Thread Markus Scherer
Philippe Verdy wrote: the input:determine strategy will work fine for UTF-8 or SCSU, provided that the leading BOM is explicitly encoded. ... With "determine" I do not mean to restrict to checking for a BOM. There are several ways to determine the input charset, depending on the protocol and docum

Re: UAX #29 beta update (text breaks): apostrophe ./. H

2003-10-29 Thread Markus Scherer
Like German "heute" (="today") where the "eu" sounds like the "oy" in Spanish "hoy"? hui=hoy=heu(te)... Neat! markus Michael Everson wrote: At 23:07 +0100 2003-10-27, Philippe Verdy wrote: The historic French word "hui" is now completely obsoleted, and commonly found only in the single expressio

Re: osmanya script transliteration

2003-10-29 Thread Markus Scherer
[EMAIL PROTECTED] wrote: is it possible to design a program that takes the vaLue of the osmanya script and compare it with the somali latin script. then afterwards, displaying the equivalent. Generally, yes - this is called script transliteration. You could try this online at http://oss.softwar

RE: unicode on Linux

2003-10-29 Thread Francois Yergeau
Philippe Verdy wrote: > The idea that "if a text (without BOM) looks like valid > UTF-8, then it is > UTF-8; else it uses another legacy encoding" does not work in > practice and also leads to too many false positives. Can you point to actual data/cases? I don't mean theoretical, I can make up

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Jim Allan
Kent Karlson posted: COMBINING COMMA BELOW is not "attached", even though cedilla is. A turned comma above is not _attached_ above... Correct. COMBINING COMMA BELOW belongs to combining class 220. However by Unicode specifications both it and an attached lower cedilla on _g_ may be rendered by u

Re: osmanya script

2003-10-29 Thread Doug Ewell
wrote: > is it possible to design a program that takes the vaLue of the osmanya > script and compare it with the somali latin script. then afterwards, > displaying the equivalent. > > by this i mean, if you tell the program that the letter A is a given > code in osmanya script as well as in soma

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Jim Allan
Peter Kirk wrote: Rather, it defines that they do not. But since this is not true on any reasonable intuitive definition of "interact typographically" (as we have seen with Hebrew vowel points), this statement makes sense only as a counterintuitive definition of "interact typographically". Exactl

Osmanya Script translation

2003-10-29 Thread aa014
please tell me if is it possible to design a program that takes the vaLue of the osmanya script and compare it with the somali latin script. then afterwards, displaying the equivalent. by this i mean, if you tell the program that the letter A is a given code in osmanya script as well as in s

osmanya script

2003-10-29 Thread aa014
is it possible to design a program that takes the vaLue of the osmanya script and compare it with the somali latin script. then afterwards, displaying the equivalent. by this i mean, if you tell the program that the letter A is a given code in osmanya script as well as in somali latin script

Re: Hebrew composition model, with cantillation marks

2003-10-29 Thread Peter Kirk
Thank you, Philippe. I include the full text of your posting plus the attachment for the benefit of those on the Unicode Hebrew list who have missed out on this. Some of the issues here have already been discussed on that list. Also I wonder if you have seen http://scripts.sil.org/cms/sites/nrs

RE: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Kent Karlsson
> << A similar situation can be seen in the Latvian letter U+0123 LATIN > SMALL LETTER G WITH CEDILLA. In good Latvian typography, this > character > is always shown with a rotated comma over the g, rather than > a cedilla > below the g, because of the typographical design and layout issues

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Michael \(michka\) Kaplan
From: "Toyin Ryan" <[EMAIL PROTECTED]> > I am trying to type the 'hacek' diacritic mark above 'c' and 'e' and > also a straight line (not a tilda) above characters too. > > The hacek is a diacritic mark used in the Czech and Lithuanian > languages. It looks like an upside-down circumflex or a poin

Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Philippe Verdy
The Unicode English name of the "hacek" character is "caron" (U+030C), and the "straight line above" is named "macron" (U+030A). Your word processor is probably inserting a decomposed caron or macron after the base letter it modifies. Did you try with the precomposed characters? C with Hacek:

Re: unicode on Linux

2003-10-29 Thread Philippe Verdy
- Original Message - From: "Markus Scherer" <[EMAIL PROTECTED]> To: "unicode" <[EMAIL PROTECTED]> Sent: Tuesday, October 28, 2003 11:35 PM Subject: Re: unicode on Linux > You should use Unicode internally - UTF-16 when you use ICU or most other libraries and software. > > Externally, th

Re: Merging combining classes, was: New contribution N2676

2003-10-29 Thread Peter Kirk
On 28/10/2003 20:01, Jim Allan wrote: ... From _The Unicode Standard 4.0_, 3.11 at http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf: << If combining characters have different combining classes--for example, when one nonspacing mark is above a base character form and another is below it--

Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Toyin Ryan
Hello subcribers I am trying to type the 'hacek' diacritic mark above 'c' and 'e' and also a straight line (not a tilda) above characters too. The hacek is a diacritic mark used in the Czech and Lithuanian languages. It looks like an upside-down circumflex or a pointed breve-essentially a small "