Re: Proposing new arrow characters with Bidi_Mirrored=Yes

Mark E. Shoulson via Unicode Tue, 08 Apr 2025 11:53:59 -0700

On 4/8/25 1:56 PM, NeatNit via Unicode wrote:

Users just type what gives them the correct appearance.
Even then, the problem with encoding duplicate characters based on layout properties is 
that "users just type what gives them the correct appearance" at the time they 
enter the character. The only context a user has is the text being typed. If that happens 
to give the correct direction, a user wouldn't know to shift to a different character, 
just in case the context might change.
wouldn't whoever enters the arrow just use the right^wcorrect one? Does text 
get converted from LTR to RTL? If so, isn't that part of the translator's 
responsibility?

You guys are mostly right: in a context of users typing in text and manually 
choosing to insert an arrow, they would choose the arrow that looks correct, 
and it doesn't matter if they use a mirroring or non-mirroring arrow. This is 
not the issue I mean to solve.

The question then is "what software processes are unavoidable and known to interfere 
with this user choice" for arrows in a bidirectional context?

(The above quoted-quotes from Asmus)

The issue is with software that programmatically inserts arrows in text that 
comes from unpredictable sources. Developers usually never think of this case, 
causing the arrow to point in the wrong direction. Real world examples:

https://github.com/deevroman/better-osm-org/issues/241 - solved by 
bidi-isolating both sides of the arrow, and programmatically selecting the 
correct arrow based on the layout direction
https://github.com/OSMCha/osmcha-frontend/issues/765 - solved by bidi-isolating 
both sides of the arrow, and relying on the fact that the interface is always 
LTR
https://meta.discourse.org/t/wrong-arrow-direction-in-rtl-text-contexts/360760 
- which I've already mentioned, **no simple way to solve it** without mirroring 
arrows!

Obviously I don't expect developers to suddenly know to switch to the mirroring arrows 
overnight, if they are added. But I would love to be able to tell them "all you have 
to do to fix it is replace this character with that one".

Ah! OK, now we're talking. I see the use case. I haven't read detailson the software in question, but I take it the point is that you'representing a route and there's a list of waypoints and it's presented as"And now go from point A → point B" and needs to belocalized/internationalized. This actually... sounds like a reasonableuse? I mean, it makes sense why this wouldn't be served by the currentsituation and why people would want something smarter.

If replacing "->" by an arrow character can change its direction, isn't it up 
to the autocorrect software to analyze the bidi context and select the correct arrow? The 
rule should be to select whatever substitution gives the same appearance (direction) as what 
the user would see for the string they typed.

The problem is this replacement is done (as far as I know) outside of any 
rendering context, when the text is just a sequence of character codes. It's 
not reasonable to know which direction the text goes. Sometimes it's completely 
impossible, if the text direction depends on context that isn't available at 
the time of replacement.

This gets back to the problem that some arrows should be mirrored ("andthen turn left (←)") and some should not. That would require someuser-smarts.

Here's a possibly disastrous idea: arrows mirror when they are within the 
domain of a Directional Override character (U+202D, U+202E).

Let's say this was implemented... Would it help solve the issues linked above 
in some way?


(this quoted-quote is from me)

Now that I see your intended situation, I think what I was imaginingwould not, in fact, help you. Just like there aredirectionality-isolates and embeddings, there are also directionalityoverrides so you can force ordinarily LTR text to be RTL or vice-versa,‮like this‬. (the last two words in the last sentence were typed and areencoded in the same order the letters would be in English, but probablyshow up reversed for you.) And I was thinking that with a right-to-leftoverride region, arrows would be reversed. But that wouldn't help youhere, except if you sorta joined the two halves of your expression byhaving them start and end an override region. But that would be messyand defeat the purpose of having them in different spans and generallytreating the two parts as independent pieces of information that arebeing joined by an arrow.

In retrospect, my original thought was a pretty stupid idea, since itessentially winds up assuming that the writer knows when the arrowshould point this way or that... in which case they could have used thecorrect arrow in the first place! The advantage of what you'reproposing is that the decision should be handled by the BiDi/mirroringalgorithm, the same algorithm that decides what direction yourparentheses face.

A similar[ly bad] idea might be to have markup-type characters, something like 
<MIRRORED SELECTOR> or some such, to indicate that an attached character should 
be mirrored (or a pair of them that indicate direction).

I actually love that idea! It would solve the issue for all arrows (and any other 
glyphs in ExtraMirroring.txt), while only introducing one or two new code point. 
Maybe also <NON MIRRORED SELECTOR> to disable mirroring even on character with 
Bidi_Mirroring=Yes.

And this would work better, if we take it to mean "the character this isattached to is _subject_ to mirroring." But markup-type characters inUnicode are a grey area and those which exist are not widely lovedeither. As Marcus Scherer writes:

Encoding characters that look the same but behave differently is a badidea. We have tried this, for example with letter-behavior clones ofsome of the typographic quotes (U+02BB, U+02BC). People use theminconsistently, because they can't tell the difference while typing orreading, and so we get problems with having to treat both equally insome places, text search, spoofing, "why does it say I am using aninvalid character?", etc.
Unicode also has some magic invisible control characters that weresupposed to change the behavior of affected characters in ways thatviolated their identity. These control codes are Deprecated withprejudice.

The directionality isolates and overrides and such are in this categoryof control characters, though I think not actually deprecated becausethey're needed(?) but still looked at a bit askance, and you don't wantyour kids playing with them...

And Marcus' point about "Encoding characters that look the same butbehave differently is a bad idea" is an extremely good one, too.

I don't even want to know about handling this in TTB contexts...

What is TTB? Couldn't quickly find it.


Top-To-Bottom.  Vertical text.  Just one more way for things to be confused.

~mark

Re: Proposing new arrow characters with Bidi_Mirrored=Yes

Reply via email to