On 10/17/2010 7:01 AM, Michael D. Adams wrote:
This is something that not even the C++ and Java reference
implementations do (though it appears that the C++ implementation of
the W rules was originally derived from a regular expression as it
uses state tables, but if so it is undocumented).  (Which by the way
they have not been proven to be equivalent, they have merely been
tested.  Proof is a much more complicated formalism.)
Having written the C++ reference implementation, I know a thing or two about it :)

The two implementations were not formally proven to be equivalent - in a way that wasn't the purpose. The purpose was to see whether several (in this case two) implementers could read the rules and come up with the same results.

When someone creates a real-world implementation to a specification like the Bidi Algorithm, it's not usually verified by formal proof, but by testing. Therefore, the exercise had to with finding out what level of testing was sufficient to capture inadvertent misapplication of some of the less-well-worded rules. (Some of them have since been rewritten to make their intent less ambiguous).

Most of the issues were found with the test pass that simply compared all possible sequences up to length 6. That is better than the BidiTest.txt file, which I understand only goes to length 4. Stochastic sampling of sequences up to length 20 resulted in fewer reported discrepancies - again, all of this is from memory.

For the test, the maximal depth of embeddings was set to 15 instead of 63, and the input were strings of bidi classes, not raw characters - therefore cutting down on the number of possible sequences.

The Java implementation was deliberately designed to be transparent - matching the way the rules are formulated in an obvious way. For the C++ implementation I wanted to do something different, and possibly faster, so I hand-coded a few state tables. The biggest challenge was not in creating those tables, but in understanding the nuances of the rules, by the way.

The situation today is not fully comparable, since there was some feedback from the reference implementation project to the rules and especially their wording.

A./

Reply via email to