Mark Davis wrote: > [L2] is not the following: ... I'm glad to hear that "bug" 1 is not how L2 is intended to work (this means that the answer to FAQ question 12 "Is Bytext bidirectionality compatible with Unicode bidirectionality?" is simply yes, instead of a qualified yes). I don't wish to give the impression that I care too much about semantic errors, but if you can't acknowledge that what was said in L2 was not what was intended (instead of just being unclear) I'm going to have to call you on that:
Let's say you have a line consisting of characters with all embedding level 4... How is "3" considered to be the lowest odd level on that line? It's no more the lowest odd level than 5 or 1 is. At best, if you consider a character with embedding level 4 to actually consist of 4 and each lower embedding level (4, 3, 2, 1, and zero), which is not entirely unreasonable, then 1 will always be the lowest odd embedding level on every line except a line consisting of all zero's. But since L2 doesn't say "...to 1", it rules out this interpretation. A function implementing L2 might go thru the following steps on each line: 1. find the highest level 2. find the lowest odd level ... For a line consisting of all 4's as above, step 1 will return 4 and step 2 should return null since there are no odd levels on the line. A list consisting of "from 4 to null" can only reasonably be interpreted as consisting only of 4. Going on with this you get the "bugs" I describe. If you are familiar with each implementation of the algorithm, it might be reassuring to users if you can state that none actually work in the manner above. Any other implementations might want to test for this. > I believe other people addressed the other two items you thought were > bugs. Other people have not addressed "bug" 2 accurately. Here's an impromptu shorthand to summarize the issue: RLE..."LRE...PDF" looks ok on 1 or more lines, unless a strong L character precedes or follows the quotation, as in: RLE...L "LRE...PDF" LRE..."RLE...PDF" looks ok on 1 or more lines, unless a strong R character precedes or follows the quotation, as in: LRE...R "RLE...PDF" LRE...RLE"..."PDF looks ok on 1 line, looks messed up on multiple lines RLE...LRE"..."PDF looks ok on 1 line, looks messed up on multiple lines This "bug" is weaker than I originally thought, but it still belongs in question 13 of the Bytext FAQ "How is using bidirectionality in Bytext easier than in Unicode?"... even Tim Partridge didn't get it right as to how to spell embedded quotations ("Surely if the quotation is meant to be right to left the RLE and PDF should be outside the entire thing, including the quotes"). These kinds of issues can be summarized as an overdependence on character properties, language specific conventions, and formatting characters with overlapping functionality that allow multiple spellings for the same formatting. In other words, as others have said, the Unicode bidirectional algorithm is too complex. The (new) Bytext encoding of bidirectionality shifts the complexity to the level of transcoding and to input methods. It effectively eliminates multiple encodings that achieve the same embedding levels, so like everything else in Bytext it is more regular expression friendly. Bernard --- Bernard Rafael Miller, email: [EMAIL PROTECTED] Format enabling simplified 8 bit regexes of UCS characters: www.bytext.org --- “We believed that the cybernetic approach to consciousness, whipped up frothy, would carry us to a plateau overlooking a pleasant mirror, but instead left us blathering in the dressed up solitude of mannequin planets.” --Steven Jesse Bernstein