> > I disagree that this makes N0 a "recursive" rule. It is a rule with > > repeatedly > > applicable subparts. And like nearly all the rules in the UBA (except ones > > which explicitly state that they apply to *original* Bidi_Class values, > > which thus have to be stored across the life of the processing of > > the string in question), all rules apply to the *current* Bidi_Class > > values of the examined context. > > Can you point out where this is stated in the UBA?
It isn't explicitly stated, as I think is evident from folks agreeing that further clarification of the text would be helpful. And I am not an author/editor of UAX #9 -- the complaints about lack of clarity need to go to them, preferably through explicit text improvement suggestions provided as feedback on PRI #279: http://www.unicode.org/review/pri279/ And I see that Laurentiu Iancu has already helpfully summarized this text issue and already provided some explicit feedback there -- so I think the upcoming UTC is covered for that. I did, however, write a reference implementation for the UBA, including the bidi bracket pairing for UBA 6.3 -- and reading the UBA spec to do so (without looking at anybody else's implementation), I came to the same conclusion regarding the processing of Bidi_Class values in rule N0 as the other author of a reference implementation did. > > According to my reading of the UBA, only W7 could qualify as something > similar to the "recursive" interpretation of N0. All the other rules > are either defined in a way that the "recursion" cannot happen > (because the conditions for applying the rule disappear after it is > applied once), or explicitly speak about a sequence of similar > characters whose bidi types are modified in the same manner. > > > Trace: Exiting br_SortPairList > > Pair list: {1,16} {2,8} {6,7} {10,14} {12,13} > > Debug: Strong direction e between brackets > > Debug: Strong direction o between brackets > > Debug: No strong direction between brackets > > Debug: Strong direction o between brackets > > Debug: No strong direction between brackets > > This doesn't explain _why_ the decision was that the direction between > brackets was one or the other. Which is at the core of the issue at > hand. So this debugging output doesn't really help here. Well, as author of the code that produced that debug output, I can agree that the debug output doesn't explain *why* it made the decisions it did -- I didn't think it was necessary. But I can see that this is a confusing part of the algorithm, and it would be fairly simple to further enhance the debugging output the reference implementation provides in future revisions, so I will endeavor to do so. In the meantime, however, the source code for that reference implementation is posted and is easily available. The relevant part of the rule processing we are talking about can be found in brrule.c. http://www.unicode.org/Public/PROGRAMS/BidiReferenceC/6.3.0/source/brrule.c The function br_ResolvePairEmbeddingLevels() (line 4368), reads down the sorted pair list, and processes each pair in sequence, calling the function br_ResolveOnePair() (line 4235). And if you examine the code in br_ResolveOnePair, you can see that it simply searches the string between the bracket pair for a *current* Bidi_Class value that counts as a strong value, and then, if necessary, searches back to find a *current* Bidi_Class value to the left of the left bracket pair that counts as a strong value. And depending on the results of those searches, it then resolves the Bidi_Class of the bracket pair itself. Following that logic, then, it is pretty clear that the behavior of each successive call to br_ResolveOnePair could, in principle, depend in checking a Bidi_Class value that had been changed by a prior call to br_ResolveOnePair. So yes, multiple, successive calls that depend on the results of the prior rule subpart. > > In any case, when designing an implementation, one normally expects to > read some formal requirements, not learn those requirements from > another implementation. The UBA has *always* been a difficult algorithm to write a clear and complete specification for. One of the reasons why several of us, over the years, have done the work to write *reference* implementations for it is so that examination of the exact behavior of those coded implementations for the various odd edge cases of the algorithm can be referred to in instances (just like this one) where the implications of a specific attempt to write out the specification of a rule in English in UAX #9 might still have some lingering ambiguity in it. And in turn, the experience of the writers of implementations has often come back around and led to suggestions to improve the wording of the specification, precisely because questions arose as to what the correct choices were for edge case behavior. > > Anyway, I'm glad we all agree that, once again, the new additions to > the UBA, and the BPA-related ones in particular, are not described > well enough to avoid misinterpretations and misunderstanding such as > this one, and that the language should be improved and clarified, > hopefully sooner rather than later. I've just lost 20 hours of work > due to that. I'm sure that the editors of UAX #9 will take this on board, and let's hope that some clearer wording can be agreed on. --Ken _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode