On Wed, Apr 23, 2014 at 06:15:44PM -0700, Asmus Freytag wrote:
> On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> >>>    GREED) Given any close-delimiter marked as “non-matching”, its
> >>>           pre-context does not contain any open-delimiter which could
> >>>           match it.
> >>>
> >>>      Here pre-context of a position is a concatenation of substrings of 
> >>> the
> >>>      initial string:
> >>>      • Take the most deeply nested “matched pair” containing the position
> >>>        (if none, the whole string);
> >>>      • take the part of the string inside this pair AND before the 
> >>> position;
> >>>      • remove all “matched” pairs completely contained insidde this 
> >>> substring
> >>>        together with what they enclose.

> >>Can you explain why, if you make "pre-context" simply the part of the
> >>whole string that precedes the unmatched close-delimiter, the words
> >>"which could match it" are insufficient?
> >Aha, this means that my description is INCOMPLETE: you got a wrong
> >impression what “match” means!  Everywhere, this word means exactly
> >the same as in the MATCH rule: that Unicode codepoints match following
> >Unicode properties.

> >This is non-recursive definition.  All rules are independent.

> That explains why you repeat most of the other constraints in your
> pre-context.

Frankly speaking, I do not see any such repetition.

> For a static definition, would it have been simpler to break the
> definition into
> two - say a "tentative parsing" (all conditions but greed) and
> "selected parsing",
> which the could be defined as the parsing that starts closest to the left.

I do not see how: to know whether a closing delimiter may be matched
or not, it is not enough to know “tentative” parsing of what preceeds
it; one must know the **actual** parsing.  Eventually, you would end
with either a recursive definition, or a definition of a “process” of
parsing.

Anyway, I’ve written my portion of definitions which combine
“tentative” stuff with “best choice” of tentative variants.  One ends
with monsters like
  http://perldoc.perl.org/perlre.html#Combining-RE-Pieces
(and, Eli, the fact that I wrote it does not imply that I must like it :-[ ).

In the case of Perl RExes, there is no alternative.  IMO, if there IS
a way to define what a “standalone” GOOD THING is, it is __much__
better than the “best of many” way.  Definiting it as “the best of
potentially good things” requires the reader to imagine first ALL the
potentially good things; only when this (otherwise not very useful)
universe has settled down in the reader’s mind they would be able to
pick up the best guy…

Ilya
_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Reply via email to