(Oops, pressed send accidentally; trying to complete the message here)

From: Unicode 
<[email protected]<mailto:[email protected]>> On 
Behalf Of Jukka K. Korpela via Unicode
Sent: Friday, January 16, 2026 8:46 PM
…
> Generally, whether a character is closing, final, initial, or opening 
> punctation should be based on language-specific
> information, such as CLDR.
I would advice against that, since 1) language information is not always 
available, 2) even when available, it is not reliable,
3) even when available and correct, people often use their primary language’s 
quotation convention, even for there second/third/… language…
For quotation marks, it is an unfortunate historical accident that different 
typographic traditions (not languages really) have
different conventions.
For “ambiguous” quote marks (and for that matter apostrophes also when not used 
as quotation marks) and line breaking
I have proposed an update to the Unicode line breaking rules (not 
language/typographic tradition dependent) in
https://www.unicode.org/L2/L2025/25261r-line-breaking.pdf.
That should take care of the line breaking issue (very annoying at present) for 
“ambiguous” quote marks.
When it comes to the bidi issue with these marks, I note that other brackets 
now seem to be treated specially (I
haven’t yet checked the latest issue of the bidi algorithm), at least there is 
a new data file:
https://www.unicode.org/Public/UNIDATA/BidiBrackets.txt. But “ambiguous” quote 
marks are not handled. One would
still need some bidi control characters (like RLM, LRM) to fix the issue. But 
people will not generally be so knowledgeable,
as well as meticulous, to input them. So I would suggest to add to bidi 
processing that

  *   <bol/SPACE/Bidi_Paired_Bracket:o><ambiguous quote mark, i.e. l.b. QU> be 
treated as Bidi_Paired_Bracket:o for the
<ambiguous quote mark, i.e. l.b. QU> part.
  *   <ambiguous quote mark, i.e. l.b. 
QU><eol/SPACE/Bidi_Paired_Bracket:c/,/./?/!/;/:/Arabic semicolon/Arabic full 
stop/Arabic comma>
be treated as Bidi_Paired_Bracket:c for the <ambiguous quote mark, i.e. l.b. 
QU> part.
While this might not be completely ideal, these two changes (to Unicode line 
breaking, and to Unicode bidi processing) would make the
mess with ambiguous quote marks a bit better handled, not needing special 
quirks (odd control characters nobody inputs) to get the
appropriate layout of characters/glyphs.
/Kent Karlsson

Reply via email to