On Tue, 2023-10-10 at 10:24 -0600, Karl Williamson wrote: > The two relevant rules, I believe are > > Keep horizontal whitespace together. > > WB3d WSegSpace × WSegSpace > > Ignore Format and Extend characters, except after sot, CR, LF, and > Newline. (See Section 6.2, Replacing Ignore Rules.) This also has the > effect of: Any × (Format | Extend | ZWJ) > > WB4 X (Extend | Format | ZWJ)* → X > > [Rule 4] says to pretend that the > Extend doesn't exist except after certain classes. The character > preceding the Extend one is a WSeqSpace character, so we get > > X Extend → X > WSegSpace Extend → WSegSoace
Not quite. Section 6.2, referenced in the comment, says that the ignore rule 4 means two things: - First, don't break before (Extend | Format | ZWJ) unless a preceding (higher-priority) rule mandates that; - Second, in every *subsequent* (lower-priority) rule, replace every boundary property X by X (Extend | Format | ZWJ)* . As rule 3d precedes rule 4, we don't get to "pretend" that the combining diaeresis doesn't exists for the purposes of rule 3d, as you say,---only for rules 5, ..., 999. Thus rule 3d does not apply anywhere, then rule 4 applies between the first space and the combining diaeresis, then 999 applies between the diaeresis and the second space. (And IIUC this makes some sense---putting a combining accent on a space is a way to typeset that combining accent by itself that doesn't require its standalone form to be encoded separately.) -- Good luck, Alex
