age-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Peter Kirk
> Sent: Thursday, November 06, 2003 3:34 AM
> To: Jony Rosenne
> Cc: 'Philippe Verdy'; [EMAIL PROTECTED]
> Subject: Re: Merging combining classes, was: New contribution N2676
>
> On
On 05/11/2003 19:59, Jony Rosenne wrote:
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Thursday, November 06, 2003 3:46 AM
Is there an initiative in Israel related to the supported
glyphs and rendering features required to supp
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
> Sent: Thursday, November 06, 2003 3:46 AM
>
> Is there an initiative in Israel related to the supported
> glyphs and rendering features required to support Hebrew,
> like it exists
From: "Peter Kirk" <[EMAIL PROTECTED]>
> It seems to me that the Unicode conformance clauses are so weak as to be
> almost useless. An application can claim to conform to Unicode but
> hardly do anything. A font can be sold, for example, as a Unicode Hebrew
> font while successfully rendering only
On 05/11/2003 15:13, Peter Constable wrote:
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
Behalf Of Peter Kirk
But I am not sure that this get-out clause should
be applicable to a process which claims as its very essence "to
support
correct
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Peter Kirk
> But I am not sure that this get-out clause should
> be applicable to a process which claims as its very essence "to
support
> correct positioning of nonspacing marks" but actually supports
Philppe Verdy posted:
I do think the opposite: one can fold all commas below to cedillas by
default,
and, in a Romanian or Latvian context, fold all cedillas below to commas
below.
I see no difference.
Folding either way will find all occurrences of cedilla or comma below.
The direction of fold
- Original Message -
From: "Jim Allan" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 30, 2003 4:48 PM
Subject: Re: Merging combining classes, was: New contribution N2676
> I offered a suggestion on cedilla and combining undercomma:
&g
I offered a suggestion on cedilla and combining undercomma:
/ It seems to me that Cedilla/undercomma folding would be a useful /
/addition to "Character Foldings" at
http://www.unicode.org/reports/tr30. /
and Philippe Verdy responded:
Excellent idea, however it has to be tailored by language:
> On 29/10/2003 15:07, John Cowan wrote:
>
> >Not necessarily. A process may check its input for normalization and
> >reject it if it is not normalized, and XML consumers are encouraged
> >(not required) to do so.
> >
> >
> >
> This looks to me like a clear breach of C9, at least of the derived
On 29/10/2003 15:07, John Cowan wrote:
Not necessarily. A process may check its input for normalization and
reject it if it is not normalized, and XML consumers are encouraged
(not required) to do so.
This looks to me like a clear breach of C9, at least of the derived
principle
no process ca
From: "Jim Allan" <[EMAIL PROTECTED]>
> It seems to me that Cedilla/undercomma folding would be a useful
> addition to "Charater Foldings" at http://www.unicode.org/reports/tr30.
Excellent idea, however it has to be tailored by language:
For example, Turkish and French (which almost always and co
Peter Kirk scripsit:
> [A process] must
> interpret a non-normalised variant in the same way as the normalised
> form; and it cannot assume that the process presenting the data makes a
> distinction between the normalised and non-normalised form and does not
> reorder the data into an arbitrar
From: "John Hudson" <[EMAIL PROTECTED]>
> All of these fonts already include the newer Romanian S/s and T/t
> commaaccent characters and correct accent forms for the Latvian diacritics
> (although the Arial comma accent is a bit too much like an unattached
cedilla).
I meant for Windows 9x/ME users
On 29/10/2003 14:14, John Cowan wrote:
Peter Kirk scripsit:
Is this actually a conformance requirement? I thought I understood the
following: A rendering engine which fails to render canonical
equivalents identically, or fails to render certain orders sensibly, is
not doing what the Unicode
Peter Kirk scripsit:
> Is this actually a conformance requirement? I thought I understood the
> following: A rendering engine which fails to render canonical
> equivalents identically, or fails to render certain orders sensibly, is
> not doing what the Unicode standard tells it that it must do.
Language Analysis Systems, Inc. Unicode list reader scripsit:
> It suggests that for many fonts,
>
> U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA
>
> and
>
> U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE
>
> would have exactly the same rendering. Some applicatio
On 29/10/2003 11:53, John Cowan wrote:
... A
rendering engine is *not* entitled to misbehave if it receives
cedilla> and try to place the dot between the "a" glyph and the cedilla;
this is a direct consequence of the conformance requirement that processes
not distinguish (unless they have speci
At 12:33 PM 10/29/2003, Philippe Verdy wrote:
Even today, it is quite hard to find any Romanian or Latvian web page using
the new Unicode characters with a comma-below: even governmental sites use
the characters coded with the cedilla, and they support that this comma
below is rendered approximate
Rich Gilliam wrote:
It suggests that
for many fonts,
U+0067 LATIN SMALL LETTER G + U+0327 COMBINING CEDILLA
and
U+0067 LATIN SMALL LETTER G + U+0312 COMBINING TURNED COMMA ABOVE
would have exactly the same rendering. Some applications would need to
know this and treat U+0067 U+0327 the same as
- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "'Jim Allan'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, October 29, 2003 6:15 PM
Subject: RE: Merging combining classes, was: N
Jim Allan scripsit:
> << For example, it is crucial that the combining class of the cedilla be
> lower than the combining class of the dot below, although their exact
> values of 202 and 220 are not important for implementation. >>
>
> This is not explained, but obviously the reason why it is "
From: "Jim Allan" <[EMAIL PROTECTED]>
> Kent Karlson posted:
>
> > COMBINING COMMA BELOW is not "attached", even though cedilla is.
> > A turned comma above is not _attached_ above...
>
> Correct. COMBINING COMMA BELOW belongs to combining class 220.
>
> However by Unicode specifications both it a
>However by Unicode specifications both it and an attached lower cedilla
>on _g_ may be rendered by unattached turned comma above which interacts
>with characters not in their respective combining classes. And this
new
>turned comma above of necessity would always be applied before normal
>uppe
At 04:04 AM 10/29/2003, Kent Karlsson wrote:
The Latvian "cedillas" are really commas below, and are best encoded so.
Still for lowercase g (not for uppercase) the comma below is _rendered_
as a turned comma above.
The 'not for uppercase' rule depends on the design of the uppercase letter.
Typica
Kent Karlson posted:
COMBINING COMMA BELOW is not "attached", even though cedilla is.
A turned comma above is not _attached_ above...
Correct. COMBINING COMMA BELOW belongs to combining class 220.
However by Unicode specifications both it and an attached lower cedilla
on _g_ may be rendered by u
Peter Kirk wrote:
Rather, it defines that they do not. But since this is not true on any
reasonable intuitive definition of "interact typographically" (as we
have seen with Hebrew vowel points), this statement makes sense only as
a counterintuitive definition of "interact typographically".
Exactl
> << A similar situation can be seen in the Latvian letter U+0123 LATIN
> SMALL LETTER G WITH CEDILLA. In good Latvian typography, this
> character
> is always shown with a rotated comma over the g, rather than
> a cedilla
> below the g, because of the typographical design and layout issues
On 28/10/2003 20:01, Jim Allan wrote:
...
From _The Unicode Standard 4.0_, 3.11 at
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf:
<< If combining characters have different combining classes--for
example, when one nonspacing mark is above a base character form and
another is below it--
I commented on what I saw as a problem in changing the positions of
diacritics in rendering from that shown in the charts from above to
below or from below to above.
John Cowan responded:
True. But that doesn't mean that the glyph that a particular font uses
for
the sequence can't have the ba
On 28/10/2003 13:35, John Cowan wrote:
...
But Unicode specifications currently say nothing about the possibility
of moving under-diacritics to an over-character position for
typographical reasons except for combination of _g_ and cedilla.
Nothing needs to be said, because glyphs are not
jim scripsit:
> Unicode encodes U+1E20 and U+1E21 as combinations of lower and uppercase
> _g_ with macron. The forms have canonical decomposition to _g_ or _G_
> followed by U+0304. This seems to rule out being able to consider a bar
> above and a bar below as variants of the same character w
Peter Kirk wrote:
Also, in the commonly used Hebrew *transliteration*, the same function
(fricative pronunciation) is indicated by a macron above g and p but
below b, d, k and t, for the same reason. It occurs only with these
letters (sometimes also written below h). There might be an argument for
On 28/10/2003 04:49, Kent Karlsson wrote:
Philippe Verdy wrote:
There's a counter example with the position of the circumflex on the
lowercase t (I can't remember for which language it occurs,
sorry), which is
in some cases not the one that its combining class would
normally take.
There
Philippe Verdy wrote:
> But we cannot define it within the UCD, but algorithmically, like for
> Hangul syllables/jamos...
Note that the *arithmetic* specification of the Hangul Syllable
canonical decompositions is just a short way of specifying the
decompositions. They CAN be listed, in a way *ju
Philippe Verdy wrote:
> There's a counter example with the position of the circumflex on the
> lowercase t (I can't remember for which language it occurs,
> sorry), which is
> in some cases not the one that its combining class would
> normally take.
There are also the cases of comma below a sma
On 27/10/2003 18:06, Philippe Verdy wrote:
From: "Peter Kirk" <[EMAIL PROTECTED]>
Thanks for the clarification. In principle we might be able to go a
little further: we could define both and as
canonically equivalent to c for all c in combining class zero. This
would have to be some kind
On 27/10/2003 16:39, Philippe Verdy wrote:
...
The backwards marking is not restricted to French accents in collation
level 2. You can use reverse ordering at any tailored level to fit other
needs, and you can also insert an extra collation level.
So I think that Mark is right here as it gives yo
From: "Peter Kirk" <[EMAIL PROTECTED]>
> Thanks for the clarification. In principle we might be able to go a
> little further: we could define both and as
> canonically equivalent to c for all c in combining class zero. This
> would have to be some kind of decomposition exception so that c is
From: "Peter Kirk" <[EMAIL PROTECTED]>
> On 27/10/2003 10:31, Philippe Verdy wrote:
>
> > ...
> >
> >The bad thing is that there's no way to say that a superfluous
> >CGJ character can be "safely" removed if CC(char1) <= CC(char2),
> >so that it will preserve the semantic of the encoded text even
From: "Peter Kirk" <[EMAIL PROTECTED]>
> each possible individually as a contraction. The Logical_Order_Exception
> property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just
One bug report note here:
The UTS#10 contains all references to several character properties,
pointing to http
On 27/10/2003 16:16, Philippe Verdy wrote:
...
So, all we can do is to define compatibility equivalence between:
and:
if and only if:
CC(c1) > CC(c2) > 0.
This won't affect the NFC and NFD conversion algorithms, but it can affect
the NFKC and NFKD conversion algorithms. This means that
From: "Peter Kirk" <[EMAIL PROTECTED]>
> On 27/10/2003 12:28, Mark Davis wrote:
>
> >Collation is very different, and already has mechanisms for dealing with
> >sequences. So no CGJ is needed there (except for case 2).
> >
> >Mark
> >
> >
> >
> Mark, can you outline what these mechanisms are or po
> So, all we can do is to define compatibility equivalence between:
>
> and:
>
> if and only if:
> CC(c1) > CC(c2) > 0.
Oops! Of course, I really meant:
All we can do is to define compatibility equivalence (NFK*)
between:
and:
unless:
CC(c1)
From: "Peter Kirk" <[EMAIL PROTECTED]>
> On 27/10/2003 10:31, Philippe Verdy wrote:
>
> > ...
> >
> >The bad thing is that there's no way to say that a superfluous
> >CGJ character can be "safely" removed if CC(char1) <= CC(char2),
> >so that it will preserve the semantic of the encoded text even
On 27/10/2003 10:31, Philippe Verdy wrote:
...
The bad thing is that there's no way to say that a superfluous
CGJ character can be "safely" removed if CC(char1) <= CC(char2),
so that it will preserve the semantic of the encoded text even
though such filtered text would not be canonically equivale
On 27/10/2003 12:28, Mark Davis wrote:
Collation is very different, and already has mechanisms for dealing with
sequences. So no CGJ is needed there (except for case 2).
Mark
Mark, can you outline what these mechanisms are or point me to a
definition e.g. in a section of UTR #10? As I had und
From: "Peter Constable" <[EMAIL PROTECTED]>
> There is no problem requiring a solution for combining marks used with
> Latin script,* including IPA and Vietnamese, because all of the marks
> that occupy a comparable space relative to the base have the same
> combining class, meaning that normaliza
From: "Mark Davis" <[EMAIL PROTECTED]>
> the UTC decision:
>
> [96-C20] Consensus: Add text to Unicode 4.0.1 which points out that
combining
> grapheme joiner has the effect of preventing the canonical re-ordering of
> combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
>
> [96-
ROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>
Cc: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Mon, 2003 Oct 27 09:09
Subject: Re: Merging combining classes, was: New contribution N2676
> On 27/10/2003 08:45,
Philippe Verdy wrote:
> This principle may help solve the ambiguities in all those affected
> scripts
> (may be there are similar issues in the Latin script for Vietnamese,
which
> would like to better fit the phonetics of words that may be
incorrectly
> rendered by the currently requited normaliz
From: "Peter Kirk" <[EMAIL PROTECTED]>
> I don't see any difference between your proposed generic CCO and CGJ. As
> you say, the same function may be needed in several scripts, including
> perhaps IPA which uses complex diacritic stacking. So why not simply use
> CGJ?
Why not effectively, but
From: "Peter Kirk" <[EMAIL PROTECTED]>
> I am not sure what you mean by "further normalization steps for Hebrew".
Of course I don't mean that NF* algorithms must be changed. See below.
> If this means that users will be expected to input Hebrew in this order,
> perhaps with a keyboard driver wh
On 27/10/2003 08:45, Mark Davis wrote:
Thank you for the interesting thoughts. As I understand your suggestion,
and bearing in mind that dagesh (and the rare rafe) are also consonant
modifiers, you are effectively suggesting an order (already normalised):
consonant dagesh rafe shin/sin-dot CGJ rig
-
From: "Peter Kirk" <[EMAIL PROTECTED]>
To: "Philippe Verdy" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Mon, 2003 Oct 27 07:49
Subject: Re: Merging combining classes, was: New contribution N2676
> On 27/10/2003 06:5
On 27/10/2003 06:54, Philippe Verdy wrote:
Thanks a lot for thzese precisions on Hebrew usages that need those
combining order overrides.
This demonstrates that this occurs relatively infrequently, and so
introducing a ignorable "combining order override" control makes sense,
without needing to ad
On 27/10/2003 07:28, Philippe Verdy wrote:
From: "Peter Kirk" <[EMAIL PROTECTED]>
So the logical order is
.
But the canonical order is
;
up to three (and in theory
more, at least in biblical Hebrew) other characters may appear between
the base letter and the dot which fundamentally modifies it
From: "Peter Kirk" <[EMAIL PROTECTED]>
> So the logical order is
> .
> But the canonical order is
> ;
> up to three (and in theory
> more, at least in biblical Hebrew) other characters may appear between
> the base letter and the dot which fundamentally modifies it.
Ohh, I forgot the case of the
Monday, October 27, 2003 1:48 PM
Subject: Re: Merging combining classes, was: New contribution N2676
> On 26/10/2003 19:58, John Hudson wrote:
>
> > ...
> > Functionally, inserting a CGJ here resolves the problem fine. I'm just
> > not convinced that CGJ is a good general
I am on a business trip abroad with only limited e-mail access. I will
try to respond next week when I'm back home.
Jony
On 26/10/2003 19:58, John Hudson wrote:
...
Functionally, inserting a CGJ here resolves the problem fine. I'm just
not convinced that CGJ is a good general solution to the normalisation
problem: it works, but it requires deliberate insertion in every place
where unwanted mark re-ordering may oc
On 26/10/2003 12:51, Jony Rosenne wrote:
While the current combining classes may cause some difficulties for Biblical
scholars (and this isn't cut and dry yet - it isn't certain whether these
are Unicode problem, implementation problems, missing characters or
mis-identified characters), I have yet
I remembered there was a lot of discussion about this case, which is why
I brought it up. Can someone remind me why ZWNBSP would be Bad for
this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly
indicates a word-break? (this is probably a problem.)
~mark
John Hudson wrote:
At 0
At 07:45 PM 10/26/2003, Mark E. Shoulson wrote:
I remembered there was a lot of discussion about this case, which is why I
brought it up. Can someone remind me why ZWNBSP would be Bad for
this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly
indicates a word-break? (this is prob
At 04:37 PM 10/26/2003, Jony Rosenne wrote:
There is nothing unusual about this. The only problem is that while the
Hiriq is between the Lamed and the Mem and belongs to the missing Yod, some
people insist that they see two vowels under the Lamed.
No, the problem is not the positioning of the hiri
> Sent: Monday, October 27, 2003 2:07 AM
> To: Jony Rosenne
> Cc: [EMAIL PROTECTED]
> Subject: Re: Merging combining classes, was: New contribution N2676
>
>
> Jony Rosenne wrote:
>
> >While the current combining classes may cause some difficulties for
> >Bib
This is, in my opinion, a missing character.
Jony
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Ted Hopp
> Sent: Monday, October 27, 2003 12:53 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Merging combining classes, was: New
Jony Rosenne wrote:
While the current combining classes may cause some difficulties for Biblical
scholars (and this isn't cut and dry yet - it isn't certain whether these
are Unicode problem, implementation problems, missing characters or
mis-identified characters), I have yet to see a claimed pro
On 25/10/2003 19:00, Philippe Verdy wrote:
From: "Peter Kirk" <[EMAIL PROTECTED]>
I can see that there might be some problems in the changeover phase. But
these are basically the same problems as are present anyway, and at
least putting them into a changeover phase means that they go away
grad
From: "Peter Kirk" <[EMAIL PROTECTED]>
> I see the point, but I would think there was something seriously wrong
> with a database setup which could change its ordering algorithm without
> somehow declaring all existing indexes invalid.
Why would such a SQL engine do so, if what has changed is an e
On Sunday, October 26, 2003 3:51 PM, Jony Rosenne wrote:
> While the current combining classes may cause some difficulties for
> Biblical scholars (and this isn't cut and dry yet - it isn't certain
> whether these are Unicode problem, implementation problems,
> missing characters or mis-identified
CTED] On Behalf Of Peter Kirk
> Sent: Sunday, October 26, 2003 9:37 PM
> To: Philippe Verdy
> Cc: [EMAIL PROTECTED]
> Subject: Re: Merging combining classes, was: New contribution N2676
>
>
> On 25/10/2003 19:00, Philippe Verdy wrote:
>
> >From: "Peter Kirk&
From: "Peter Kirk" <[EMAIL PROTECTED]>
> I can see that there might be some problems in the changeover phase. But
> these are basically the same problems as are present anyway, and at
> least putting them into a changeover phase means that they go away
> gradually instead of being standardised for
Philippe Verdy wrote:
The problem with this solution is that stability is not guaranteed across
backward versions of Unicode: if a tool A implements the new version of
combining classes and normalizes its input, it will keep the relative
ordering of characters. If its output is injected into a too
On 25/10/2003 09:11, Philippe Verdy wrote:
From: "Peter Kirk" <[EMAIL PROTECTED]>
...
The problem would then be the interoperability of Unicode-compliant
systems using distinct versions of Unicode (for example between
XML processors, text editors, input methods, renderers, text
converters, ful
From: "Peter Kirk" <[EMAIL PROTECTED]>
> I wonder if it would in fact be possible to merge certain adjacent
> combining classes, as from a future numbered version N of the standard.
> That would not affect the normalisation of existing text; text
> normalised before version N would remain normalise
76 matches
Mail list logo