Reversible bidi (wrote RE: Unicode and Security)

Marco Cimarosti Thu, 07 Feb 2002 04:10:25 -0800

Otto Stolz wrote:
> Gaspar Sinai wrote:
> > Just because some companies who have influence on Unicode
> > Consortium use some algorithm, like backing store and re-mapping,
> > it does  not mean that this is the only way. [...]
> > Yudit does convert the input to view order and back.
> 
> Now, this reveals the real problem.
> 
>  From this description, I gather that Gaspar's editor does not
> preserve the backing store, hence it has to reconstruct it from
> the rendering. As the rendering process is a n->1 mapping, its
> reverse is, intrisically, ambiguous. So, the attempt to recon-
> struct the original character sequence from the vsual appearance
> is bound to fail, in the general case.


Dankeschön, Otto!

I have been wondering for all the duration of this discussion what the heck
Gaspar and everybody else were talking about. Now I begin to understand.
Could we please drop all this garbage about security (this is not the
Anti-fraud Mailing List!) and talk about this implementation problem?

As I see it, dropping the backing store after running the bidi algorithm is
not necessarily a bad idea. But a condition must be respected: each
character's *embedding* levels and *override* information should be
preserved together with the text.

With this additional data in hand, it is not impossible to define a
*reversed* Bidi algorithm which effectively recovers the backing store from
the visual order.

Roozbeh Pournander, I, and other people have discussed this at length on
this list, and a very similar algorithm is actually implemented as part of
ICU.

Such a reversed Bidi technique does not necessarily restore a bit-wise copy
of the original backing store. However, the resulting backing store is
guaranteed to (a) have the same logical order as the original and (b) have
the same nesting of bidi embedding and overrides. The only things that this
approach drops are redundant bidi controls (such as a LTR embedding within
an already LTR segment), but is this all bad?

Even the John Cowan's example becomes perfectly unambiguous, if the bidi
embedding levels are retained:

Case 1:
        From visual order:      the Arabs = BARA-LA
        And bidi levels:        1111111111112222222
        Get logical order:      the Arabs = AL-ARAB

Case 2:
        From visual order:      the Arabs = BARA-LA
        And bidi levels:        3333333332222222222
        Get logical order:      AL-ARAB = the Arabs

It is not perfectly clear whether this approach is more or less functional
than the traditional approach of maintaining the backing store. What is
important, is that the two techniques have the same result.

My impression is that, although this reverse bidi requires more processing
(text must undergo two bidi algorithms vs. one), it makes the editing of
text a little bit easier, both for the programmer and for the user.

Roozbeh and I also considered that, as the embedding level are available
during the editing process, it would also be possible to *display* them
(e.g., in the form of stacks of horizontal arrows drawn under the text), and
this would make clear to the user the exact reading order.

_ Marco

Reversible bidi (wrote RE: Unicode and Security)

Reply via email to