Suppose I have a paragraph (uppercase = RTL): CARROT IS car\u00ADrot IN ENGLISH
and the paragraph gets broken at the soft hyphen. Is the correct ordering for the first line car- SI TORRAC or -car SI TORRAC ? I did not succeed in deducing the answer from UAX#9. Soft hyphen has bidi class BN, which means it gets removed in stage X9, and so, if I have understood correctly, doesn't have a defined embedding level. I'm guessing the correct ordering is the first one, but I don't trust my instincts here. (In particular, I wondered whether this was analogous to the case where rule L1 resets embedding levels so that trailing whitespace is at the visual end of the line.) More generally, suppose you have a markup language which has a construct for discretionary breaks, as in TeX, with pre-break, post-break and no-break text. Soft hyphen is a special case of this (where the pre-break text consists of a hyphen, and the pos and no-break texts are empty); you can also regard space as a kind of discretionary break (post-break text empty, no-break text contains the space, pre-break text either contains the space or is empty, depending on how you want to think about it). Obviously the embedding level for the no-break text should be resolved as if discretionary break was replaced by the no-break text (which is consistent with a bidi class of BN for soft hyphen). However, for the pre- and post-break text, it is not clear to me what the right way is to resolve embedding levels (or how their content should be restricted so that there is a sensible way to resolve the embedding levels). I would be grateful for any suggestions. James
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode