On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
  a parsing is good if it satisfies all conditions below:

    0) Some delimiters in the string are marked as “non-matching”; the rest
       is broken into disjoint “matched” pairs;

    MATCH) A “matched” pair consists of an open-delimiter and matching close-
           delimiter (in this order in the string).

    NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
          positioned as Open1 Open2 Close1 Close2 in the string order).

    MINLEN) “Inside” a “matched” pair, every delimiter which could match 
elements
            of the pair but is marked as “non-matching” must nest inside
            some deeper-nested “matched” pair.

(I hope that the meaning of the word “inside” in MINLEN is clear.)

    GREED) Given any close-delimiter marked as “non-matching”, its
           pre-context does not contain any open-delimiter which could
           match it.

      Here pre-context of a position is a concatenation of substrings of the
      initial string:
      • Take the most deeply nested “matched pair” containing the position
        (if none, the whole string);
      • take the part of the string inside this pair AND before the position;
      • remove all “matched” pairs completely contained insidde this substring
        together with what they enclose.
This is a very nice formal definition. I'm surprised that your "GREED"
statement needs such a complex auxiliary concept (pre-context).

Can you explain why, if you make "pre-context" simply the part of the
whole string that precedes the unmatched close-delimiter, the words
"which could match it" are insufficient?
Aha, this means that my description is INCOMPLETE: you got a wrong
impression what “match” means!  Everywhere, this word means exactly
the same as in the MATCH rule: that Unicode codepoints match following
Unicode properties.

This is non-recursive definition.  All rules are independent.

That explains why you repeat most of the other constraints in your pre-context.

  Without
complicated notion of pre-context, matching [] in

   ( [ ) ]

would be an acceptable match.

Thanks for your corrections,
Ilya

For a static definition, would it have been simpler to break the definition into two - say a "tentative parsing" (all conditions but greed) and "selected parsing",
which the could be defined as the parsing that starts closest to the left.

(I don't have the time as I write this to work out whether that's the correct condition, as I am about to board a ride, but just as a trigger to thought what
a split definition might achieve).

A./
_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Reply via email to