[unicode] Re: Bidi editing (was Re: Unicode editing)

Roozbeh Pournader Thu, 22 Mar 2001 15:40:24 -0800

On Wed, 21 Mar 2001, Marco Cimarosti wrote:

> Visual:       she said i need water and expired
> Levels:       000000000222222222222000000000000
> Logic:        she said <LRE>i need water<PDF> and expired
> 
> I don't see how such an embedding could be useful, so I would iron
> level "2" to the surrounding "0" and, consequently, remove the
> embedding controls from the logical string.

I'm not talking about beautiful examples. I'm talking about the situation
where you open a file from some other source, from someone who stays
compliant to the spec, but interprets it some other way, and considers
that embedding information useful. This is what I have in mind: there
should be something like normalization for bidi also. Considering
uppercase letters to be Arabic, "A<RLO>B<PDF>C" is equal to "ABC" in both
semantics and presentation. You should be permitted to remove the
controls.

> I don't know. The reason for having that virtual zero-width character is
> exactly to make it visible to the user, so that she can act on it (change
> its embedding level).
> 
> If this is to be hidden, then what is it for?

I thought you need it for your buffer to ease the implementation. I think
users don't like zero-width characters. There are some that are frequent
unavoidable, like the ZWNJ in Persian, but that pain can be eased if the
software tries to be more intelligent, behaving more like a function, not
a character.

For example, the application may have a key that disconnects the previous
and next letters, and another one that joins them. Compare previous
experience of clicking in the middle of word, where there is a ZWNJ,
pressing backspace, and finding that it's the letter before that you have
deleted! You will then retype the letter, move the cursor forward (which
will have no visual effect on the screen), and press the backspce again.
With the new idea, you just click there and press the join-removing
hotkey.

So if want the functionality, provide it in some other way, not by
introducing a new zero-width.

> I am not the person who can help you with this: I don't even know Arabic
> editing, and I am adjusting my opinions each time I discover some new fact
> from you.

I'm CC-ing Tfazrir Cohen. Tfazrir, would you help?

This is the exact question: How do Hebrew people use their shifted
keyboard that's usually Latin? Do they switch between keyboards, or often
like to use the shifted keyboard? When in a Hebrew keyboard mode, do they
like their space to behave like a right-to-left character, or a neutral
one (so they may be able to use it in an English phrase)?

> - Else, if it is typed just after another directional character (i.e., the
> cursor hasn't been moved since the last letter was entered), then it gets
> the same directionality as the last character;

No, if the space is a neutral space, you will need more intelligence:
consider that the user is typing "ABC abc DEF" which will get displayed as
"FED abc CBA". After she has typed her "ABC", your rule works OK. But not
with the second space. She will get something like "FEDabc  CBA" on the
screen, and "ABC<rtl space>abc<ltr space>DEF" in the buffer.

You may like to eliminate this rule, sticking to the default paragraph
rule. But again, if she wants "ABC abc def DEF" ("FED abc def CBA" on the
display), that's needed. Other problems will raise, when you also consider
cursor movement.

What I believe, is that you should keep more state if something like
"neutral space" comes from the keyboard. If we should keep some info
inside the buffer, or we can find a way to keep O(1) info outside the
buffer, I don't know. What I can surely say, is that it gets much more
complex than what we came to.

--roozbeh
[unicode] Re: Bidi editing (was Re: Unicode editing)

Reply via email to