On 04/08/2003 14:59, Kenneth Whistler wrote:

Peter Kirk asked:



In other words, if what you need is to glue things together,
i.e. a zero width no-break space *function*, then use
U+2060. If what you need is a BOM for the encoding scheme
specifications, then use U+FEFF.

What is *discouraged*, but not prohibited, of course, is
using U+FEFF for a zero width no-break space *function*,
precisely because that interacts so confusingly with
the BOM.

--Ken



And what if you need a ZWNBS function for something other than gluing things together? For example, as a carrier for a string or line initial diacritical mark when no spacing is required?


This is not something sanctioned by the standard.


The carrier for a combining mark that is to display in isolation without
a base character is U+0020 SPACE. If you want to also indicate the
absence of a line break opportunity, then the carrier is U+00A0
NO-BREAK SPACE (NBSP).

Neither of these is appropriate to the case I have in mind (described in greater detail below) as they are not zero width and therefore give an unwanted indent at the start of a line. U+200B ZERO WIDTH SPACE might be appropriate, but this has the problem that it is a break opportunity, which is not always appropriate.


Despite its name, U+FEFF ZWNBS is *NOT* a space character. It is formally gc=Cf, not gc=Zs. It also does not have the White_Space property.

So "a ZWNBS function for something other than gluing things together"
is a contradiction in terms of the current definition of the standard.
The *meaning* of the "ZWNBS function" is its behavior in the
context of UAX #14, Line Breaking Properties. See the WJ Word joiner
entry (normative) of UAX #14:

http://www.unicode.org/reports/tr14/


Thank you, Ken, and also Mark. I didn't know where to find these details. Mark wrote:

Their
names may be misleading; people intending to use them for any other
function should carefully read the sections of the Unicode Standard
that discuss their usage.

But which sections? Where is the index, online? It is unfortunate that there are no links from the character charts or the database to the various places where the uses of the characters are explained. All there is is a character name, and as I have found quite often this character name is seriously misleading if not actually incorrect. It is highly unfortunate that it is not permitted to change these misleading names.

As it is, the note at U+FEFF in the character charts reads "use as an indication of non-breaking is deprecated...", although you wrote that this was not deprecated. But there is no note that use of ZERO WIDTH NO-BREAK SPACE as a zero width no-break space is deprecated or "a contradiction in terms of the current definition of the standard". Are you surprised that I am confused?

Ken continued:

This is one of the suggestions for some of the Hebrew problems, but I have had no response to my suggestion of using U+2060, which is inappropriately named for the function I have in mind.



The function I think you have in mind is not isolated display of
a combining mark, but rather trying to find a mechanism for
getting around the conformance strictures of the standard, to
get a combining mark to apply to a *following* base
character, rather than to a *preceding* base character.


If by "apply" in the above you mean "be positioned adjacent to", there is already a problem with the standard: the EXISTING Hebrew page of the standard is in contravention to its conformance strictures. This is because under the existing standard (irrespective of any changes being proposed) and in legacy encodings, the combining mark holam, which is usually graphically positioned above the preceding base character, is in certain environments, specifically when followed by a silent alef (holam male is a separate issue), graphically positioned above the following base character. But the standard has anticipated this kind of difficulty by recognising that positioning is not always consistent with logical ordering, see the note on Indic vowel signs in The Unicode Standard 4.0 section 2.10, subsection "Sequence of Base Characters and Diacritics", http://www.unicode.org/book/preview/ch02.pdf. This is a documented special case; Hebrew holam followed by silent alef is also a special case whether you like it or not, it just hasn't been documented. It could be removed, but that would require changes to every existing (ancient or modern) pointed Hebrew text.

Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.


Understood. I await alternative suggestions.

--Ken






--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to