2015-01-30 9:32 GMT+01:00 Mark Davis ☕️ <[email protected]>: > 2. Also, the following 2 rules are not equivalent: > > a) Any × (Format | Extend) > b) X (Extend | Format)* → X >
That's what I replied in the first message but using an "as if" which was not clear enough, my seconde reply reformulated it by making clear about the right side (the substitution iccuring n the next rules; that you view as a "shortcut"). Your first argument about convolution is not very justified between WB56 and WB57 that are also clear when rewritten by separating ALetter and HebrewLetter. But I also note this case for Hebrew's handling of apostrophes/quotes also exists in the Latin script (including in English only) for the context of word-breaking only (this does not apply to linebreaking and syllable breaking for hyphenation, which are other types of breakers). The rule about Format and Extend is still kept separate in WB56 and listed first only because it correctly preserves the canonical equivalences for extenders, which include all combining characters with non-zero combining class; and which also include the gold rule for not breaking in the middle of default grapheme clusters (which also includes joiners like CGJ and ZWJ with any breaker algorithms, except code point breakers for some conforming UTF's like UTF-16). WB57 is evidently subject to tailorings. It just provides a default behavior where the single quote/apostrophe is handled as an elision mark most often used at end of words, and glued with the next word without space separation. WB57 It also handles the case where it is also followed by some spaces or other punctuations and the single quote is then not an orthographic elision mark but a punctuation marking an end of quotation. One problem is the SingleQuote class used in WB57 is possibly too large : it acts as an elision mark (apostrophe) only for a smaller number of single-quote-like characters. The other problem of WB57 is that it assumes that elision marked by apostrophes occurs only at end of words (not true even for English) and this is where per-language tailoring is not only possible but most probably recommended. Such tailoring should will affect the behavor of WB56 (notably in English, French, Italian... where the apostrophe is lexicalized and its usage regulated by their standard grammar). ---- But I wonder if tailoring of WB57 is not also needed for Hebrew. I see WB57 only as a initial default tailoring for the script itself, not for the actual language (which may also be Yiddish). And could also include usual transcriptions of foreign words, or of common but informal abbreviations/contractions too (the apostrophe is highly prefered to the dot for abbreviating/contracting in the middle of a word and notably when the abbreviated part is not even pronounced but completely elided. It seems ajso that Swedish may also use the colon in the middle of a word, without space separations, instead of an apostrophe. Other languages may prefer other signs for elisions (including an hyphen; which does not break words but only syllables for candidate breaking of long lines), notably if there are confusions with quote-like letters Another common notation (found in French typography) uses superscripts for the final letters when elision occurs in the middle of a word, but this is in fact just a written abbreviation (this totaly replaces the use of the abbreviation dot; normally never used in the middle and completely eliminated in acronyms): this is not really an elision the abbreviated word with superscript is sctill fullly read without the elision; so the apostrophe cannot be used.
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

