On 4/8/25 1:56 PM, NeatNit via Unicode wrote:

Users just type what gives them the correct appearance.
Even then, the problem with encoding duplicate characters based on layout properties is 
that "users just type what gives them the correct appearance" at the time they 
enter the character. The only context a user has is the text being typed. If that happens 
to give the correct direction, a user wouldn't know to shift to a different character, 
just in case the context might change.
wouldn't whoever enters the arrow just use the right^wcorrect one? Does text 
get converted from LTR to RTL? If so, isn't that part of the translator's 
responsibility?
You guys are mostly right: in a context of users typing in text and manually 
choosing to insert an arrow, they would choose the arrow that looks correct, 
and it doesn't matter if they use a mirroring or non-mirroring arrow. This is 
not the issue I mean to solve.

The question then is "what software processes are unavoidable and known to interfere 
with this user choice" for arrows in a bidirectional context?
(The above quoted-quotes from Asmus)
The issue is with software that programmatically inserts arrows in text that 
comes from unpredictable sources. Developers usually never think of this case, 
causing the arrow to point in the wrong direction. Real world examples:

https://github.com/deevroman/better-osm-org/issues/241 - solved by 
bidi-isolating both sides of the arrow, and programmatically selecting the 
correct arrow based on the layout direction
https://github.com/OSMCha/osmcha-frontend/issues/765 - solved by bidi-isolating 
both sides of the arrow, and relying on the fact that the interface is always 
LTR
https://meta.discourse.org/t/wrong-arrow-direction-in-rtl-text-contexts/360760 
- which I've already mentioned, **no simple way to solve it** without mirroring 
arrows!

Obviously I don't expect developers to suddenly know to switch to the mirroring arrows 
overnight, if they are added. But I would love to be able to tell them "all you have 
to do to fix it is replace this character with that one".

Ah!  OK, now we're talking.  I see the use case.  I haven't read details on the software in question, but I take it the point is that you're presenting a route and there's a list of waypoints and it's presented as "And now go from point A → point B" and needs to be localized/internationalized.  This actually... sounds like a reasonable use?  I mean, it makes sense why this wouldn't be served by the current situation and why people would want something smarter.
If replacing "->" by an arrow character can change its direction, isn't it up 
to the autocorrect software to analyze the bidi context and select the correct arrow? The 
rule should be to select whatever substitution gives the same appearance (direction) as what 
the user would see for the string they typed.
The problem is this replacement is done (as far as I know) outside of any 
rendering context, when the text is just a sequence of character codes. It's 
not reasonable to know which direction the text goes. Sometimes it's completely 
impossible, if the text direction depends on context that isn't available at 
the time of replacement.
This gets back to the problem that some arrows should be mirrored ("and then turn left (←)") and some should not.  That would require some user-smarts.
Here's a possibly disastrous idea: arrows mirror when they are within the 
domain of a Directional Override character (U+202D, U+202E).
Let's say this was implemented... Would it help solve the issues linked above 
in some way?

(this quoted-quote is from me)

Now that I see your intended situation, I think what I was imagining would not, in fact, help you.  Just like there are directionality-isolates and embeddings, there are also directionality overrides so you can force ordinarily LTR text to be RTL or vice-versa, ‮like this‬. (the last two words in the last sentence were typed and are encoded in the same order the letters would be in English, but probably show up reversed for you.)  And I was thinking that with a right-to-left override region, arrows would be reversed.  But that wouldn't help you here, except if you sorta joined the two halves of your expression by having them start and end an override region.  But that would be messy and defeat the purpose of having them in different spans and generally treating the two parts as independent pieces of information that are being joined by an arrow.

In retrospect, my original thought was a pretty stupid idea, since it essentially winds up assuming that the writer knows when the arrow should point this way or that... in which case they could have used the correct arrow in the first place!  The advantage of what you're proposing is that the decision should be handled by the BiDi/mirroring algorithm, the same algorithm that decides what direction your parentheses face.

A similar[ly bad] idea might be to have markup-type characters, something like 
<MIRRORED SELECTOR> or some such, to indicate that an attached character should 
be mirrored (or a pair of them that indicate direction).
I actually love that idea! It would solve the issue for all arrows (and any other 
glyphs in ExtraMirroring.txt), while only introducing one or two new code point. 
Maybe also <NON MIRRORED SELECTOR> to disable mirroring even on character with 
Bidi_Mirroring=Yes.

And this would work better, if we take it to mean "the character this is attached to is _subject_ to mirroring."  But markup-type characters in Unicode are a grey area and those which exist are not widely loved either.  As Marcus Scherer writes:

Encoding characters that look the same but behave differently is a bad idea. We have tried this, for example with letter-behavior clones of some of the typographic quotes (U+02BB, U+02BC). People use them inconsistently, because they can't tell the difference while typing or reading, and so we get problems with having to treat both equally in some places, text search, spoofing, "why does it say I am using an invalid character?", etc.

Unicode also has some magic invisible control characters that were supposed to change the behavior of affected characters in ways that violated their identity. These control codes are Deprecated with prejudice.

The directionality isolates and overrides and such are in this category of control characters, though I think not actually deprecated because they're needed(?) but still looked at a bit askance, and you don't want your kids playing with them...

And Marcus' point about "Encoding characters that look the same but behave differently is a bad idea" is an extremely good one, too.

I don't even want to know about handling this in TTB contexts...
What is TTB? Couldn't quickly find it.

Top-To-Bottom.  Vertical text.  Just one more way for things to be confused.

~mark

Reply via email to