Paul McGuire a écrit :
> Question 1:
> format_code        := '+' | '-' | '*' | '#'
> I need to specify that a single, identical, format_code code may be
> repeated.
> Not that a there may be several one on a sequence.
> format             := (format_code)+
> would catch '+-', which is wrong. I want only patterns such as '--',
> '+++',...
>
>
> This interpretation of '+' in your BNF is a bit out of the norm.  Usually
> this notation 'format_code+' would accept 1 or more of any of your
> format_code symbols, so '+-+--++' would match.
That's what I intended to write above. "(format_code)+ would catch '+-', which is wrong." I need a pattern that matches a repetition of the same token, this token beeing an item of a set. Of course, I could write a pattern for each token... but it is supposed to be programming, not cooking ;-)
What I'm looking for is a format that may not exist:
format          := (format_code)++
where '++' means 'repetition of an identical token'

> In pyparsing, you could match things like '----' using the Word class and
> specifying a string containing the single character '-':  Word('-').  That
> is, parse a word made up of '-' characters.  There is no pyparsing construct
> that exactly matches your (format_code)+ repetition, but you could use Word
> and MatchFirst as in:
>
> format = MatchFirst(Word(c) for c in "+-*#")

That's it! I had not realized that, as pyparsing is real puthon, one can also use python idioms /inside/ the grammar... good! thank you. So that it is also possible to have variables, no? Then, my question #2 should be solved, too.

> A corresponding regular expression might be:
> formatRE = '|'.join(re.escape(c)+'+' for c in "+-*#")
>
> which you could then parse using the re module, or wrap in a pyparsing Regex
> object:
>
> format = Regex(formatRE)
>
>
> Question 2:
> style_code := '/' | '!' | '_'
> Similar case, but different. I want patterns like:
> styled_text        := style plain_text style
> where both style instances are identical. As the number of styles may grow
> (and even be impredictable: the style_code line will actually be written at
> runtime according to a config file) I don't want, and anyway can't, specify
> all possible kinds of styled_text. Even if possible, it would be ugly!
>
> pyparsing includes to methods to help you match the same text that was
> matched before - matchPreviousLiteral and matchPreviousExpr.  Here is how
> your example would look:
>
> plain_text = Word(alphanums + " ")
> styled_text = style + plain_text + matchPreviousLiteral(style)
>
> (There is similar capability in regular expressions, too.)

Good, thank you again. Do you know if there is any way to express such things in ordinary E/BNF, or in any dialect coming from BNF? It's like a variable inside a pattern, and I personly have never seen that. Pattern variables would also be very helpful as (said before) I need to write or at least reconfigurate the grammar at runtime.

> Question 3:
> I would like to specify a "side-condition" for a pattern, meaning that it
> should only match when a specific token lies aside. For instance:
> A  := A_pattern {X}
> X is not part of the pattern, thus should not be extracted. If X is just
> "garbage", I can write an enlarged pattern, then let it down later:
> A  := A_pattern
> A_X        := A X
>
> I think you might be looking for some kind of lookahead.  In pyparsing, this
> is supported using the FollowedBy class.
>
> A_pattern = Word(alphas)
> X = Literal(".")
> A = A_pattern + FollowedBy(X).leaveWhitespace()
>
> print A.searchString("alskd sldjf sldfj. slfdj . slfjd slfkj.")
>
> prints
>
> [['sldfj'], ['slfkj']]

I guess there is the same for left-side conditions. I'm going to search myself. This guy who develops pyParsing thinks at everything. There are so many helper functions and processing methods -- how can you know all of that by heart, Paul ?

Denis


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to