On 10/17/2010 7:01 AM, Michael D. Adams wrote:
This is something that not even the C++ and Java reference
implementations do (though it appears that the C++ implementation of
the W rules was originally derived from a regular expression as it
uses state tables, but if so it is undocumented). (Which by the way
they have not been proven to be equivalent, they have merely been
tested. Proof is a much more complicated formalism.)
Having written the C++ reference implementation, I know a thing or two
about it :)
The two implementations were not formally proven to be equivalent - in a
way that wasn't the purpose. The purpose was to see whether several (in
this case two) implementers could read the rules and come up with the
same results.
When someone creates a real-world implementation to a specification like
the Bidi Algorithm, it's not usually verified by formal proof, but by
testing. Therefore, the exercise had to with finding out what level of
testing was sufficient to capture inadvertent misapplication of some of
the less-well-worded rules. (Some of them have since been rewritten to
make their intent less ambiguous).
Most of the issues were found with the test pass that simply compared
all possible sequences up to length 6. That is better than the
BidiTest.txt file, which I understand only goes to length 4. Stochastic
sampling of sequences up to length 20 resulted in fewer reported
discrepancies - again, all of this is from memory.
For the test, the maximal depth of embeddings was set to 15 instead of
63, and the input were strings of bidi classes, not raw characters -
therefore cutting down on the number of possible sequences.
The Java implementation was deliberately designed to be transparent -
matching the way the rules are formulated in an obvious way. For the C++
implementation I wanted to do something different, and possibly faster,
so I hand-coded a few state tables. The biggest challenge was not in
creating those tables, but in understanding the nuances of the rules, by
the way.
The situation today is not fully comparable, since there was some
feedback from the reference implementation project to the rules and
especially their wording.
A./