Otto Stolz wrote: > Gaspar Sinai wrote: > > Just because some companies who have influence on Unicode > > Consortium use some algorithm, like backing store and re-mapping, > > it does not mean that this is the only way. [...] > > Yudit does convert the input to view order and back. > > Now, this reveals the real problem. > > From this description, I gather that Gaspar's editor does not > preserve the backing store, hence it has to reconstruct it from > the rendering. As the rendering process is a n->1 mapping, its > reverse is, intrisically, ambiguous. So, the attempt to recon- > struct the original character sequence from the vsual appearance > is bound to fail, in the general case.
Dankeschön, Otto! I have been wondering for all the duration of this discussion what the heck Gaspar and everybody else were talking about. Now I begin to understand. Could we please drop all this garbage about security (this is not the Anti-fraud Mailing List!) and talk about this implementation problem? As I see it, dropping the backing store after running the bidi algorithm is not necessarily a bad idea. But a condition must be respected: each character's *embedding* levels and *override* information should be preserved together with the text. With this additional data in hand, it is not impossible to define a *reversed* Bidi algorithm which effectively recovers the backing store from the visual order. Roozbeh Pournander, I, and other people have discussed this at length on this list, and a very similar algorithm is actually implemented as part of ICU. Such a reversed Bidi technique does not necessarily restore a bit-wise copy of the original backing store. However, the resulting backing store is guaranteed to (a) have the same logical order as the original and (b) have the same nesting of bidi embedding and overrides. The only things that this approach drops are redundant bidi controls (such as a LTR embedding within an already LTR segment), but is this all bad? Even the John Cowan's example becomes perfectly unambiguous, if the bidi embedding levels are retained: Case 1: From visual order: the Arabs = BARA-LA And bidi levels: 1111111111112222222 Get logical order: the Arabs = AL-ARAB Case 2: From visual order: the Arabs = BARA-LA And bidi levels: 3333333332222222222 Get logical order: AL-ARAB = the Arabs It is not perfectly clear whether this approach is more or less functional than the traditional approach of maintaining the backing store. What is important, is that the two techniques have the same result. My impression is that, although this reverse bidi requires more processing (text must undergo two bidi algorithms vs. one), it makes the editing of text a little bit easier, both for the programmer and for the user. Roozbeh and I also considered that, as the embedding level are available during the editing process, it would also be possible to *display* them (e.g., in the form of stacks of horizontal arrows drawn under the text), and this would make clear to the user the exact reading order. _ Marco