On 4/8/25 1:56 PM, NeatNit via Unicode wrote:
Users just type what gives them the correct appearance.
Even then, the problem with encoding duplicate characters based on layout properties is
that "users just type what gives them the correct appearance" at the time they
enter the character. The only context a user has is the text being typed. If that happens
to give the correct direction, a user wouldn't know to shift to a different character,
just in case the context might change.
wouldn't whoever enters the arrow just use the right^wcorrect one? Does text
get converted from LTR to RTL? If so, isn't that part of the translator's
responsibility?
You guys are mostly right: in a context of users typing in text and manually
choosing to insert an arrow, they would choose the arrow that looks correct,
and it doesn't matter if they use a mirroring or non-mirroring arrow. This is
not the issue I mean to solve.
The question then is "what software processes are unavoidable and known to interfere
with this user choice" for arrows in a bidirectional context?
(The above quoted-quotes from Asmus)
The issue is with software that programmatically inserts arrows in text that
comes from unpredictable sources. Developers usually never think of this case,
causing the arrow to point in the wrong direction. Real world examples:
https://github.com/deevroman/better-osm-org/issues/241 - solved by
bidi-isolating both sides of the arrow, and programmatically selecting the
correct arrow based on the layout direction
https://github.com/OSMCha/osmcha-frontend/issues/765 - solved by bidi-isolating
both sides of the arrow, and relying on the fact that the interface is always
LTR
https://meta.discourse.org/t/wrong-arrow-direction-in-rtl-text-contexts/360760
- which I've already mentioned, **no simple way to solve it** without mirroring
arrows!
Obviously I don't expect developers to suddenly know to switch to the mirroring arrows
overnight, if they are added. But I would love to be able to tell them "all you have
to do to fix it is replace this character with that one".
Ah! OK, now we're talking. I see the use case. I haven't read details
on the software in question, but I take it the point is that you're
presenting a route and there's a list of waypoints and it's presented as
"And now go from point A → point B" and needs to be
localized/internationalized. This actually... sounds like a reasonable
use? I mean, it makes sense why this wouldn't be served by the current
situation and why people would want something smarter.
If replacing "->" by an arrow character can change its direction, isn't it up
to the autocorrect software to analyze the bidi context and select the correct arrow? The
rule should be to select whatever substitution gives the same appearance (direction) as what
the user would see for the string they typed.
The problem is this replacement is done (as far as I know) outside of any
rendering context, when the text is just a sequence of character codes. It's
not reasonable to know which direction the text goes. Sometimes it's completely
impossible, if the text direction depends on context that isn't available at
the time of replacement.
This gets back to the problem that some arrows should be mirrored ("and
then turn left (←)") and some should not. That would require some
user-smarts.
Here's a possibly disastrous idea: arrows mirror when they are within the
domain of a Directional Override character (U+202D, U+202E).
Let's say this was implemented... Would it help solve the issues linked above
in some way?
(this quoted-quote is from me)
Now that I see your intended situation, I think what I was imagining
would not, in fact, help you. Just like there are
directionality-isolates and embeddings, there are also directionality
overrides so you can force ordinarily LTR text to be RTL or vice-versa,
like this. (the last two words in the last sentence were typed and are
encoded in the same order the letters would be in English, but probably
show up reversed for you.) And I was thinking that with a right-to-left
override region, arrows would be reversed. But that wouldn't help you
here, except if you sorta joined the two halves of your expression by
having them start and end an override region. But that would be messy
and defeat the purpose of having them in different spans and generally
treating the two parts as independent pieces of information that are
being joined by an arrow.
In retrospect, my original thought was a pretty stupid idea, since it
essentially winds up assuming that the writer knows when the arrow
should point this way or that... in which case they could have used the
correct arrow in the first place! The advantage of what you're
proposing is that the decision should be handled by the BiDi/mirroring
algorithm, the same algorithm that decides what direction your
parentheses face.
A similar[ly bad] idea might be to have markup-type characters, something like
<MIRRORED SELECTOR> or some such, to indicate that an attached character should
be mirrored (or a pair of them that indicate direction).
I actually love that idea! It would solve the issue for all arrows (and any other
glyphs in ExtraMirroring.txt), while only introducing one or two new code point.
Maybe also <NON MIRRORED SELECTOR> to disable mirroring even on character with
Bidi_Mirroring=Yes.
And this would work better, if we take it to mean "the character this is
attached to is _subject_ to mirroring." But markup-type characters in
Unicode are a grey area and those which exist are not widely loved
either. As Marcus Scherer writes:
Encoding characters that look the same but behave differently is a bad
idea. We have tried this, for example with letter-behavior clones of
some of the typographic quotes (U+02BB, U+02BC). People use them
inconsistently, because they can't tell the difference while typing or
reading, and so we get problems with having to treat both equally in
some places, text search, spoofing, "why does it say I am using an
invalid character?", etc.
Unicode also has some magic invisible control characters that were
supposed to change the behavior of affected characters in ways that
violated their identity. These control codes are Deprecated with
prejudice.
The directionality isolates and overrides and such are in this category
of control characters, though I think not actually deprecated because
they're needed(?) but still looked at a bit askance, and you don't want
your kids playing with them...
And Marcus' point about "Encoding characters that look the same but
behave differently is a bad idea" is an extremely good one, too.
I don't even want to know about handling this in TTB contexts...
What is TTB? Couldn't quickly find it.
Top-To-Bottom. Vertical text. Just one more way for things to be confused.
~mark