(Oops, pressed send accidentally; trying to complete the message here) From: Unicode <[email protected]<mailto:[email protected]>> On Behalf Of Jukka K. Korpela via Unicode Sent: Friday, January 16, 2026 8:46 PM … > Generally, whether a character is closing, final, initial, or opening > punctation should be based on language-specific > information, such as CLDR. I would advice against that, since 1) language information is not always available, 2) even when available, it is not reliable, 3) even when available and correct, people often use their primary language’s quotation convention, even for there second/third/… language… For quotation marks, it is an unfortunate historical accident that different typographic traditions (not languages really) have different conventions. For “ambiguous” quote marks (and for that matter apostrophes also when not used as quotation marks) and line breaking I have proposed an update to the Unicode line breaking rules (not language/typographic tradition dependent) in https://www.unicode.org/L2/L2025/25261r-line-breaking.pdf. That should take care of the line breaking issue (very annoying at present) for “ambiguous” quote marks. When it comes to the bidi issue with these marks, I note that other brackets now seem to be treated specially (I haven’t yet checked the latest issue of the bidi algorithm), at least there is a new data file: https://www.unicode.org/Public/UNIDATA/BidiBrackets.txt. But “ambiguous” quote marks are not handled. One would still need some bidi control characters (like RLM, LRM) to fix the issue. But people will not generally be so knowledgeable, as well as meticulous, to input them. So I would suggest to add to bidi processing that
* <bol/SPACE/Bidi_Paired_Bracket:o><ambiguous quote mark, i.e. l.b. QU> be treated as Bidi_Paired_Bracket:o for the <ambiguous quote mark, i.e. l.b. QU> part. * <ambiguous quote mark, i.e. l.b. QU><eol/SPACE/Bidi_Paired_Bracket:c/,/./?/!/;/:/Arabic semicolon/Arabic full stop/Arabic comma> be treated as Bidi_Paired_Bracket:c for the <ambiguous quote mark, i.e. l.b. QU> part. While this might not be completely ideal, these two changes (to Unicode line breaking, and to Unicode bidi processing) would make the mess with ambiguous quote marks a bit better handled, not needing special quirks (odd control characters nobody inputs) to get the appropriate layout of characters/glyphs. /Kent Karlsson
