On Saturday, June 28, 2003 1:15 AM, Kenneth Whistler <[EMAIL PROTECTED]> wrote:
> Philippe Verdy said:
>
> > I understand the frustration: if Unicode had not attempted to define
> > combining classes, which were not necessary to Unicode, all
> > existing combining characters would have been given
Peter responded:
> Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:
>
> > Why is making use of the existing behavior of existing characters
> > a "groanable kludge", if it has the desired effect and makes
> > the required distinctions in text?
>
> Why is it a kludge to insert some cc=0 control
Peter countered:
> > > Could this finally be the missing "killer ap" for the CGJ?
> >
> > It will be perfect to allow an application like XML to encode Hebrew
> > text using Unicode 4.0 rules (and before).
>
> It is not perfect. CGJ is supposed to be significant (and kept in the
> text) for a v
Andrew West wrote:
> I have to agree 100% with Peter on this. The potential fiasco with regards to
> Mongolian Free Variation Selectors is another area where our grandchildren are
> going to be weeping with despair if we are not careful.
Well, I doubt that our grandchildren will be quite *that*
Philippe Verdy said:
> I understand the frustration: if Unicode had not attempted to define
> combining classes, which were not necessary to Unicode, all
> existing combining characters would have been given a CC=0
> (or all the same 220 or 230 value).
Uh, no.
Under this scheme, would be di
At 01:45 PM 6/27/2003, Philippe Verdy wrote:
I understand the frustration:
Similar to the frustration of having private, off-list messages replied to
in public.
if Unicode had not attempted to define
combining classes, which were not necessary to Unicode, all
existing combining characters would
On Friday, June 27, 2003 10:29 PM, Rick McGowan <[EMAIL PROTECTED]>
wrote:
> The Unicode Technical Committee has posted a new issue for public
> review and comment. Details are on the following web page:
>
> http://www.unicode.org/review/
>
> Briefly, the new issue is:
>
> Issue #11 Soft Dott
On Friday, June 27, 2003 10:28 PM, John Hudson <[EMAIL PROTECTED]> wrote:
> I don't think it would break any modern Hebrew document, because it
> is not in any way essential to modern Hebrew that the vowels have
> fixed position combining classes as in Unicode. That is part of the
> frustration: th
Rick McGowan <[EMAIL PROTECTED]> has privately suggested moving
the discussion of Combining Classes of *Tibetan* Characters
from the main Unicode list [EMAIL PROTECTED] to the TIBEX list
[EMAIL PROTECTED] - an "experts" list which was set up several
years ago specifically to discuss proposals for
Kenneth Whistler said on June 27, 2003 at 4:08 PM
>Karljürgen,
>> 2. Consequently ANY OTHER solution than 'FIX the obvious mistake(s)' is a
>> kludge (contra Philippe's (?) recent comment). One *pays* for all
kludges,
>> one way or the other.
>Digital encoding of writing systems is a kludge. An
The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:
http://www.unicode.org/review/
Review periods for the new items close on August 18, 2003.
Please see the page for links to discussion and relevant documents.
Brief
Karljürgen,
> 2. Consequently ANY OTHER solution than 'FIX the obvious mistake(s)' is a
> kludge (contra Philippe's (?) recent comment). One *pays* for all kludges,
> one way or the other.
Digital encoding of writing systems is a kludge. And boy, do we
seem to be paying for the Unicode version o
Jony Rosenne wrote on 06/27/2003 08:32:11 AM:
> I am under the impression that the existing scientific encodings of the
> Bible are encode with the help of some kind of mark up, and maybe this
is
> how they should continue.
The existing eBHS texts use an encoding in which the order of characters
At 10:20 AM 6/27/2003, John Cowan wrote:
> What if the request to change the Hebrew combining classes came *from* W3C
> and/or IETF? I'm not saying that this is likely, but I'm wondering whether
> they might, in fact, not insist on stability for characters for which
> normalisation is currently br
Peter replied:
> Karljürgen Feuerherm wrote on 06/27/2003 08:23:08 AM:
>
> > Now, Q: I take it the combining classes are linked to the script, rather
> > than say to a dialect
>
> They're linked to the character.
> > --e.g. one can't define BH as a separate dialect from
> > MH with its own set o
(repost. last word missing, sorry)
> John Cowan said on June 27, 2003 at 12:56 PM
>
> Michael Everson had said:
> > > This is not analogous to the present situation, it seems to me. In
> > > the first place, what else is the \ for? :-)
> >
> > Escaping special characters, since you ask.
>
> But
Jony Rosenne said on June 27, 2003 at 2:17 PM
> > 1. Everyone is more or less agreed that the present combining
> > class rules as they apply to BH contain mistakes.
>
> I don't. I do agree that are some cases that are not handled.
Well, ok, 'omissions,' then. If BH was intended to be covered in
[EMAIL PROTECTED] scripsit:
> Of course, the point is, this is a particular situation where
>
>
>
> is canonically equivalent to
>
>
No, Mark had it right. These two are canonically equivalent and therefore
no normal Unicode process (including rendering) can treat them differently.
That's
Philippe said on June 27, 2003 at 10:25 AM
Do you then propose to create a specific character, for use within the
Hebrew script only, as a way to specify an alternate order for hebrew
cantillation? In that case, it would be more appropriate to define new
standard variants of these cantillation mar
Karljürgen Feuerherm wrote on 06/27/2003 08:23:08 AM:
> Now, Q: I take it the combining classes are linked to the script, rather
> than say to a dialect
They're linked to the character.
> --e.g. one can't define BH as a separate dialect from
> MH with its own set of rules?
No, not unless BH is
Philippe wrote:
> When I just look at the history of combining classes, they did not exist in
> the first Unicode standard, and they still don't exist in ISO10646 as well.
> This was a technology developed by IBM and offered for free to the community
Excuse me Philippe, but you are wrong. Please
John Cowan wrote on 06/27/2003 06:29:12 AM:
> Since the use of non-ASCII characters in things like XML and the DNS
I suspect the users of Biblical Hebrew would rather be told they can't use
Hebrew vowels and accents in markup or URIs than deal with a hack to fix
errors in the combining classes.
Philippe Verdy wrote on 06/27/2003 04:46:56 AM:
> > Could this finally be the missing "killer ap" for the CGJ?
>
> It will be perfect to allow an application like XML to encode Hebrew
> text using Unicode 4.0 rules (and before).
It is not perfect. CGJ is supposed to be significant (and kept in t
Philippe Verdy said on June 27, 2003 at 12:38 PM
Subject: Re: Biblical Hebrew (Was: Major Defect in Combining Classes of
Tibetan Vowels)
> On Friday, June 27, 2003 5:53 PM, Karljürgen Feuerherm
<[EMAIL PROTECTED]> wrote:
> > And in any case this should NOT muck things up which aren't broken,
> >
Michael Everson wrote on 06/27/2003 09:39:16 AM:
> But you might trot on over with a white flag to parley about a problem.
>
> They [IETF] 're only human beings over there, just as we are over here.
Every time I have referred to IETF as "them" in his presence, Misha Wolf
has reminded me, "WE ar
John Cowan wrote on 06/27/2003 08:24:35 AM:
> The IETF has an explicit contract with Unicode: "We'
> ll use your normalization algorithm if you promise NEVER, NEVER to
change
> the normalization status of a single character." Unicode has already
> broken that promise four times, so its credibili
John Cowan said on June 27, 2003 at 12:56 PM
Michael Everson had said:
> > This is not analogous to the present situation, it seems to me. In
> > the first place, what else is the \ for? :-)
>
> Escaping special characters, since you ask.
But in a completely different.
K
Philippe Verdy scripsit:
> Given that XML will require normalization for texts identified as
> being Unicode encoded (UTF-8 and others), couldn't a document be
> labelled so that the normalization step be removed from the XML
> processing, using a "ISO-10646-8" encoding name (for the UTF-8
> encod
At 05:48 AM 6/27/2003, Michael Everson wrote:
The W3C would also hit the roof if Unicode normalization changed radically.
I don't think anyone is proposing a *radical* change.
I have uploaded the relevant draft pages of the SBL Hebrew user manual to
http://www.tiro.com/transfer/SBLappendi
John Hudson scripsit:
> What if the request to change the Hebrew combining classes came *from* W3C
> and/or IETF? I'm not saying that this is likely, but I'm wondering whether
> they might, in fact, not insist on stability for characters for which
> normalisation is currently broken anyway?
Th
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Karljrgen Feuerherm
> Sent: Friday, June 27, 2003 3:23 PM
> To: [EMAIL PROTECTED]
> Subject: SPAM: Re: Biblical Hebrew (Was: Major Defect in
> Combining Classes of Tibetan Vowels)
>
>
>
> 1. Ever
John Cowan said on June 27, 2003 at 12:48 PM
> Karljürgen Feuerherm scripsit:
> > > > Several people have expressed reasons why this can't be
(practically) be
> > > > done--which mainly seem to stem from political concerns.
> > >
> > > All concerns involving human beings -- ho bios politikos -- a
At 03:12 AM 6/27/2003, Michael Everson wrote:
Who is it who will kill the Unicode Consortium if UAX #15 were to be
revised? Did it occur to anyone to *ask* about the possible revision of
classes for the dozen or so instances that would be affected?
My understanding is that stability promises hav
At 02:53 AM 6/27/2003, [EMAIL PROTECTED] wrote:
ISO: Then, obviously they need to correct their errors. I mean, it's not
like the wrong characters got encoded or something. Tell them to just fix
the errors; that can't be difficult to do, and is obviously the right
thing to do.
That seems to be exa
Michael Everson scripsit:
> No, but you're not making a technical argument, either.
"The life of [Unicode] has not been logic but experience."
--Oliver Wendell Holmes, somewhat mutated
> >Not when their core values -- correctness vs. stability -- are made to
> >be at odds.
>
> And shift
Michael Everson scripsit:
> And sometimes not, then. What four characters have been corrected so
> far? Were they "important" characters to some company? Are there no
> Christians or Jews in the IETF who might care about a problem like
> this, where a simple solution might be effected? Particul
Karljürgen Feuerherm scripsit:
> > The use of
> > the backslash character in DOS/Windows systems as a path separator is
> > arguably a mistake
>
> I hardly think so. It was a matter of a necessary alternative. It could only
> be viewed as a mistake on the assumption that somehow the Unix way was
On Friday, June 27, 2003 6:01 PM, Philippe Verdy <[EMAIL PROTECTED]>
wrote:
Given that XML will require normalization for texts identified as
being Unicode encoded (UTF-8 and others), couldn't a document be
labelled so that the normalization step be removed from the XML
processing, using a "ISO-1
On Friday, June 27, 2003 5:53 PM, Karljürgen Feuerherm <[EMAIL PROTECTED]> wrote:
> And in any case this should NOT muck things up which aren't broken,
> like MH.
Not breaking Modern Hebrew means not changing the combining classes
of the characters it uses.
Adding a distinct set for Traditional
On Friday, June 27, 2003 5:05 PM, Michael Everson <[EMAIL PROTECTED]> wrote:
> At 10:40 -0400 2003-06-27, John Cowan wrote:
> > Karljürgen Feuerherm scripsit:
> >
> > > 1. Everyone is more or less agreed that the present combining
> > > class rules as they apply to BH contain mistakes. The clea
Philippe Verdy scripsit:
> May be Unicode should be more prudent with Normalization Forms: if
> new characters are added, their combining classes should be
> documented as informative before there is a consensus and
> experimentation. This will not break the stability pact with XML, which
> will s
Philippe said on June 27, 2003 at 10:25 AM
> On Friday, June 27, 2003 3:23 PM, Karljürgen Feuerherm
<[EMAIL PROTECTED]> wrote:
> > I REALLY think that option 1 [FIX the combining classes] should be
beaten to death with a stick,
> > then beaten to death again, before settling for one of the others.
On Friday, June 27, 2003 4:40 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> Not so. Sometimes stability is more important than correctness.
Very well answered. I don't see why we need to sacrifice stability when
correcting something. As the error is not in ISO10646, it is definitely not
reasonnable
Andrew C. West wrote:
> I have to agree 100% with Peter on this. The potential fiasco with
> regards to Mongolian Free Variation Selectors is another area where
> our grandchildren are going to be weeping with despair if we are
> not careful. The standardized variants for Mongolian were set in
>
Michael Everson scripsit:
> But you might trot on over with a white flag to parley about a problem.
>
> They're only human beings over there, just as we are over here.
Michael, I *am* the guy carrying the white flag to the W3C, and I have
made promises about what the Unicode Consortium will and
On Friday, June 27, 2003 4:44 PM, Ben Dougall <[EMAIL PROTECTED]> wrote:
> i'm a bit confused. i thought that this type of thing was already
> pretty well covered by the various unicode resources? (i guess there's
> a strong chance not, if you're asking this question).
I'm not discussing about ho
At 10:40 -0400 2003-06-27, John Cowan wrote:
Karljürgen Feuerherm scripsit:
1. Everyone is more or less agreed that the present combining class rules as
they apply to BH contain mistakes. The clearly preferential way to deal with
mistakes in any technological/computing software environment is t
At 09:24 -0400 2003-06-27, John Cowan wrote:
Michael Everson scripsit:
So, you're saying, no one has asked IETF whether or not they would be
able to countenance a dozen or so changes for unimplemented things
like biblical accents.
The IETF has an explicit contract with Unicode: "We'
ll use your
i'm a bit confused. i thought that this type of thing was already
pretty well covered by the various unicode resources? (i guess there's
a strong chance not, if you're asking this question).
this is the way i see it:
it's for you to decide which format you internally normalise to (i'm
not even
Philippe Verdy wrote:
> The current use of CGJ is for sequences like:
> and
> which still encode the French words "boeuf" and "effet", where the
> author gives a hint to display the sequence "oe" as a single ligated
> form instead of two separate grapheme clusters, despite this
> corres
At 09:16 -0400 2003-06-27, John Cowan wrote:
Michael Everson scripsit:
Oh, come on. Let's not put words in people's mouths. Ifs and mights
are not facts.
Expressed attitudes are facts, and it's reasonable to extrapolate people's
future behaviors, at least the general trend thereof, from their ex
Karljürgen Feuerherm scripsit:
> 1. Everyone is more or less agreed that the present combining class rules as
> they apply to BH contain mistakes. The clearly preferential way to deal with
> mistakes in any technological/computing software environment is to FIX them.
Not so. Sometimes stability
On Friday, June 27, 2003 3:23 PM, Karljürgen Feuerherm <[EMAIL PROTECTED]> wrote:
> > At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
> Now, Q: I take it the combining classes are linked to the script,
> rather than say to a dialect--e.g. one can't define BH as a separate
> dialect from MH with
On Friday, June 27, 2003 3:36 PM, Jony Rosenne <[EMAIL PROTECTED]> wrote:
> For Hebrew and Arabic, add a step: Find the root, remove prefixes,
> suffixes and other grammatical artifacts and obtain the base form of
> the word.
Removing common suffixes is a separate issue (this requires unificatio
(Regret I hadn't yet read this post prior to my last post)
Peter said, in reponse to Ken:
> Why is it a kludge to insert some cc=0 control character into the text for
> the sole purpose of preventing reordering during canonical ordering of two
> combining marks that do interact typographically an
> At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
>
> >I just have a hard time believing that 50 years from now our
> >grandchildren won't look back [...]
I am in complete agreement with the spirit of what Peter says, though
realistically, 50 years from now, this is likely to be all neither h
Michael Everson scripsit:
> Oh, come on. Let's not put words in people's mouths. Ifs and mights
> are not facts.
Expressed attitudes are facts, and it's reasonable to extrapolate people's
future behaviors, at least the general trend thereof, from their expressed
attitudes. When someone draws a
Michael Everson scripsit:
> So, you're saying, no one has asked IETF whether or not they would be
> able to countenance a dozen or so changes for unimplemented things
> like biblical accents.
The IETF has an explicit contract with Unicode: "We'
ll use your normalization algorithm if you promise
At 07:28 -0400 2003-06-27, John Cowan wrote:
Michael Everson scripsit:
Who is it who will kill the Unicode Consortium if UAX #15 were to be
revised? Did it occur to anyone to *ask* about the possible revision
of classes for the dozen or so instances that would be affected?
The IETF, for one. I
At 14:34 +0200 2003-06-27, Philippe Verdy wrote:
On Friday, June 27, 2003 1:29 PM, John Cowan <[EMAIL PROTECTED]> wrote:
Michael Everson scripsit:
Change the character classes in Unicode 4.1, and they *might* decide
to freeze support at, say, Unicode 3.0.
Or they may simply opt to define their *
For Hebrew and Arabic, add a step: Find the root, remove prefixes, suffixes
and other grammatical artifacts and obtain the base form of the word.
Nearly nobody does it, and searches in these languages are less useful than
parallel searches in other languages.
Jony
> -Original Message-
>
On Friday, June 27, 2003 1:29 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> Michael Everson scripsit:
> Change the character classes in Unicode 4.1, and they *might* decide
> to freeze support at, say, Unicode 3.0.
Or they may simply opt to define their *OWN* normalization standard, distinct from
U
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
> Sent: Friday, June 27, 2003 12:31 PM
> To: [EMAIL PROTECTED]
> Subject: SPAM: About combining classes
>
>
> When I just look at the history of combining classes, they
> did not exi
On Fri, 27 Jun 2003 04:22:30 -0500, [EMAIL PROTECTED] wrote:
> I just have a hard time believing that 50 years from now our grandchildren
> won't look back, "What were they thinking? So it took them a couple of
> years to figure out canonical ordering and normalization; why on earth
> didn't th
In order to implement a plain-text search algorithm, in a language neutral way that
would still work with all scripts, I am searching for advices on how this can be done
"safely" (notably for automated search engines), to allow searching for text matching
some basic encoding styles.
My first a
Michael Everson scripsit:
> Who is it who will kill the Unicode Consortium if UAX #15 were to be
> revised? Did it occur to anyone to *ask* about the possible revision
> of classes for the dozen or so instances that would be affected?
The IETF, for one. IETF is already very wary of Unicode, ev
When I just look at the history of combining classes, they did not exist in the first
Unicode standard, and they still don't exist in ISO10646 as well.
This was a technology developed by IBM and offered for free to the community to allow
a simplified management of encoded texts, and it has long b
At 04:53 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
If they're so unaware of combining classes, might it not seem
reasonable to think the the dialog might continue as follows?
- [gives explanation of combining classes and the related problem for Hebrew]
ISO: So, you're saying you're coming to us
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
Are we saying that ISO doesn't give a rip for implementation issues?
Duplication of characters is not the way to fix (forgive me, UTC)
*Unicode's* error in combining characters.
Or that their notion of ordering distinctions is different from
U
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
I just have a hard time believing that 50 years from now our
grandchildren won't look back, "What were they thinking? So it took
them a couple of years to figure out canonical ordering and
normalization; why on earth didn't they work that o
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
In discussing these issues among Biblical Hebrew implementers,
content providers and users, I have had to explain repeatedly why
UTC doesn't want to consider this. It is completely obvious to them
that this is the right solution. Even on ex
On Friday, June 27, 2003 3:54 AM, Kenneth Whistler <[EMAIL PROTECTED]> wrote:
> John,
>
> > At 03:36 PM 6/26/2003, Kenneth Whistler wrote:
> >
> > > Why is making use of the existing behavior of existing characters
> > > a "groanable kludge", if it has the desired effect and makes
> > > the requ
At 10:09 +0200 2003-06-27, Jony Rosenne wrote:
Whatever you do, any new characters designed for solving these problems
should not be in the Hebrew block. Add a new Biblical Hebrew block, clearly
labeled as not intended for regular Hebrew use.
And I suggest that whenever a proposal comes up to the U
At 23:59 -0700 2003-06-26, John Hudson wrote:
I think there is a reasonable case to be made for treating modern
Hebrew and Biblical Hebrew as separate languages for pretty much all
purposes. The existing codepoints with the fixed position combining
classes work fine for Modern Hebrew, and there
Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:
> But in the 10646 WG2 context... You can always come in
> with the proposal to encode BIBLICAL HEBREW POINT PATAH and
> say, even though the glyph is identical, see, the name is
> different, so the character is different. But this is a pretty
> th
Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:
> Why is making use of the existing behavior of existing characters
> a "groanable kludge", if it has the desired effect and makes
> the required distinctions in text?
Why is it a kludge to insert some cc=0 control character into the text for
the
> but premature standardization can
> also be a problem if the wrong choices get codified too soon.
As in canonical combining classes? :-)
- Peter
---
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W
Kenneth Whistler wrote on 06/26/2003 10:15:12 PM:
> How does a user of pointed Hebrew text know whether they are
> dealing with the legacy points...
Ken, corresponding arguments apply equally to your suggestion of putting
CGJ everywhere and letting software make it transparent to the user: how
Ken Whistler wrote on 06/26/2003 05:04:55 PM:
> Another possibility to consider is U+2060 WORD JOINER, the
> version of the zero width non-breaking space unfreighted with
> the BOM confusion of U+FEFF.
It wouldn't allow line breaks, but it would indicate an unwanted word
boundary, no? (I don't h
Kenneth Whistler wrote on 06/26/2003 08:54:08 PM:
> Actually, in casting around for the solution to the problem of
> introduction of format controls creating defective combining
> character sequences, it finally occurred to me that:
>
> U+034F COMBINING GRAPHEME JOINER
>
> has the requisite prop
John Hudson wrote on 06/26/2003 03:19:44 PM:
> >That is a potential solution, thought it would have to be *two*
additional
> >metegs.
>
> Can you explain your thinking here, Peter?
I was thinking of the three-way distinction for hataf vowels, but you were
correct in pointing out earlier that c
Rick McGowan wrote on 06/26/2003 05:52:32 PM:
> The *best* thing to do, in my personal opinion and I know it'll get shot
> down so don't bother telling me so, is to fix the combining classes of
the
> Hebrew points.
In discussing these issues among Biblical Hebrew implementers, content
provi
Whatever you do, any new characters designed for solving these problems
should not be in the Hebrew block. Add a new Biblical Hebrew block, clearly
labeled as not intended for regular Hebrew use.
And I suggest that whenever a proposal comes up to the UTC, it would be
advantageous to involve Israel
John,
You just discovered one more shortcoming of UniScribe. As you say, the
authors did not consider this particular case. I suppose it will be fixed
sooner or later.
I don't see how this affects the discussion, though. UniScribe and most
current fonts do not process the simple case of Holam cor
At 08:15 PM 6/26/2003, Kenneth Whistler wrote:
But who then does end up carrying the can eventually, if we go
the cloning route? Cloning 14 characters creates a *new*
normalization problem, and forces non-Biblical-scholar users of
pointed Hebrew text to carry *that* particular can.
...
I think if
It is not a problem, this is how it should be.
Jony
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Mark Davis
> Sent: Thursday, June 26, 2003 11:46 PM
> To: Kenneth Whistler; [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject:
86 matches
Mail list logo