Le mer. 3 sept. 2025 à 20:10, Henry via Unicode <[email protected]> a écrit :
> × 200B ÷ 0308 × 0024 ÷ # × [0.3] ZERO WIDTH SPACE (ZW_NotEastAsian) ÷ > [8.0] COMBINING DIAERESIS (CM1_NotEastAsian_CM) × [24.03] DOLLAR SIGN > (PR_NotEastAsian) ÷ [0.3] > > LB24 states "Do not break between alphabetics (“at”)" with the following > break rule: > > (PR | PO) × (AL | HL) > (AL | HL) × (PR | PO) > > However, neither U+200B nor U+0308 has break class PR, PO, AL, or HL (they > have break class ZW and CM). > You missed rule LB10. LB9: Treat X (CM | ZWJ)* as if it were X, where X is any line break class except BK, CR, LF, NL, SP, or ZW. LB10: Treat any remaining CM or ZWJ as if it had the properties of U+0041 A LATIN CAPITAL LETTER A, that is, Line_Break=AL, General_Category=Lu, East_Asian_Width=Na, Extended_Pictographic=N. U+0208 is CM. U+200B is ZW, so LB9 does not apply. Therefore, LB10 applies, and it becomes AL for subsequent rules. LB24 therefore applies, (AL | HL) × (PR | PO). Same for the other example you cite, a CM becomes AL. Best regards, Robin Leroy
