Some of this seems to be in reference to an earlier contention that
Text Boundaries (inc. Lines) break between the space and the
non-spacing mark. I think this was attributed to Phillipe.

[This may not be true: I don't actually read his email, because the
information content per line falls below my email threshold; not to
say that there may not be information there, but I cannot afford to
take the time to find out -- sadly, one of my character flaws.]

All of the text boundaries preserve grapheme cluster boundaries, which
never separate a base character (including space and NBSP) from a
following NSM. In addition, each of the boundary types above grapheme
clusters make some statement about the behavior of the grapheme
cluster. For example, with line boundaries a SPACE + NSM has a special
behavior. With the others, the behavior is the same as the base
character.

As Ken points out, in any event these are default boundaries, and can
be tailored. That being said, if the normal behavior of the default
can be improvied, and someone has a concrete proposal for doing so,
then it can be considered.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, August 11, 2003 12:26
Subject: Re: Questions on ZWNBS - for line initial holam plus alef


> Peter Kirk wrote:
>
> > I think this may be a "Peter mistake". I meant to refer to spacing
> > diacritics. Sorry.
> >
> > It is certainly highly inappropriate for spacing diacritics to
> > be considered word boundaries.
>
> Why? It is entirely dependent on the orthography and conventions
> involved. There is probably as much (or more) bad ASCII usage
> of spacing diacritics like `this', where a grave accent character
> is being misapplied to make a directional quotation mark, as
> there is actual, linguistically appropriate use of spacing
> diacritics.
>
> Also, everyone should consider carefully the status of UAX #29,
> Text Boundaries.
>
> <quote>
> 2 Conformance
>
> This is informative material. There are many different ways to
> divide text elements corresponding to grapheme clusters, words
> and sentences, and the Unicode Standard and this document do not
> restrict the ways in which implementations can do this.
>
> This specification is a <emphasis>default</emphasis> mechanism;
> more sophisticated engines can and should tailor it for particular
> locales or environments. ...
> </quote>
>
> The whole UAX is informative. It is a here's-how-you-can-approach-
> the-problem implementation guide with some suggestions for
> rules and classes.
>
> *If* you are working with an orthography that uses one or more
> spacing diacritics, and
> *If* those spacing diacritics need to be represented by
> <SPACE, NSM> sequences,
>
> then you are in the situation where your implementation of
> text boundaries should take <SPACE, NSM> sequences explicitly
> into account, so as to result in expected behavior for that
> orthography.
>
> Everyone has had experiences with their platform UI producing
> bad results for text boundaries. The Solaris platform I am
> writing this on right now, for example, implements a double-click
> word selection that treats the string "`this'," above, including
> the grave accent, the apostrophe, and the comma, as a "word".
> Is that right or wrong? Well, it depends on what you are trying
> to do, I expect.
>
> But even the most sophisticated platform implementers can only
> do so much with processes like default word selection. It is
> bound to be wrong for one purpose or another and for one
> orthography or another. Ultimately you need to have tailored
> processes that can be orthography-specific if you want to
> get best results.
>
> --Ken
>
>
>


Reply via email to