Peter Kirk asked:Neither of these is appropriate to the case I have in mind (described in greater detail below) as they are not zero width and therefore give an unwanted indent at the start of a line. U+200B ZERO WIDTH SPACE might be appropriate, but this has the problem that it is a break opportunity, which is not always appropriate.
And what if you need a ZWNBS function for something other than gluing things together? For example, as a carrier for a string or line initial diacritical mark when no spacing is required?In other words, if what you need is to glue things together, i.e. a zero width no-break space *function*, then use U+2060. If what you need is a BOM for the encoding scheme specifications, then use U+FEFF.
What is *discouraged*, but not prohibited, of course, is using U+FEFF for a zero width no-break space *function*, precisely because that interacts so confusingly with the BOM.
--Ken
This is not something sanctioned by the standard.
The carrier for a combining mark that is to display in isolation without a base character is U+0020 SPACE. If you want to also indicate the absence of a line break opportunity, then the carrier is U+00A0 NO-BREAK SPACE (NBSP).
Thank you, Ken, and also Mark. I didn't know where to find these details. Mark wrote:
Despite its name, U+FEFF ZWNBS is *NOT* a space character. It is formally gc=Cf, not gc=Zs. It also does not have the White_Space property.
So "a ZWNBS function for something other than gluing things together" is a contradiction in terms of the current definition of the standard. The *meaning* of the "ZWNBS function" is its behavior in the context of UAX #14, Line Breaking Properties. See the WJ Word joiner entry (normative) of UAX #14:
http://www.unicode.org/reports/tr14/
But which sections? Where is the index, online? It is unfortunate that there are no links from the character charts or the database to the various places where the uses of the characters are explained. All there is is a character name, and as I have found quite often this character name is seriously misleading if not actually incorrect. It is highly unfortunate that it is not permitted to change these misleading names.Their names may be misleading; people intending to use them for any other function should carefully read the sections of the Unicode Standard that discuss their usage.
As it is, the note at U+FEFF in the character charts reads "use as an indication of non-breaking is deprecated...", although you wrote that this was not deprecated. But there is no note that use of ZERO WIDTH NO-BREAK SPACE as a zero width no-break space is deprecated or "a contradiction in terms of the current definition of the standard". Are you surprised that I am confused?
Ken continued:
If by "apply" in the above you mean "be positioned adjacent to", there is already a problem with the standard: the EXISTING Hebrew page of the standard is in contravention to its conformance strictures. This is because under the existing standard (irrespective of any changes being proposed) and in legacy encodings, the combining mark holam, which is usually graphically positioned above the preceding base character, is in certain environments, specifically when followed by a silent alef (holam male is a separate issue), graphically positioned above the following base character. But the standard has anticipated this kind of difficulty by recognising that positioning is not always consistent with logical ordering, see the note on Indic vowel signs in The Unicode Standard 4.0 section 2.10, subsection "Sequence of Base Characters and Diacritics", http://www.unicode.org/book/preview/ch02.pdf. This is a documented special case; Hebrew holam followed by silent alef is also a special case whether you like it or not, it just hasn't been documented. It could be removed, but that would require changes to every existing (ancient or modern) pointed Hebrew text.This is one of the suggestions for some of the Hebrew problems, but I have had no response to my suggestion of using U+2060, which is inappropriately named for the function I have in mind.
The function I think you have in mind is not isolated display of
a combining mark, but rather trying to find a mechanism for
getting around the conformance strictures of the standard, to
get a combining mark to apply to a *following* base
character, rather than to a *preceding* base character.
Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.Understood. I await alternative suggestions.
--Ken
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/