Question 1: format_code := '+' | '-' | '*' | '#' I need to specify that a single, identical, format_code code may be repeated. Not that a there may be several one on a sequence. format := (format_code)+ would catch '+-', which is wrong. I want only patterns such as '--', '+++',...
This interpretation of '+' in your BNF is a bit out of the norm. Usually this notation 'format_code+' would accept 1 or more of any of your format_code symbols, so '+-+--++' would match. In pyparsing, you could match things like '----' using the Word class and specifying a string containing the single character '-': Word('-'). That is, parse a word made up of '-' characters. There is no pyparsing construct that exactly matches your (format_code)+ repetition, but you could use Word and MatchFirst as in: format = MatchFirst(Word(c) for c in "+-*#") A corresponding regular expression might be: formatRE = '|'.join(re.escape(c)+'+' for c in "+-*#") which you could then parse using the re module, or wrap in a pyparsing Regex object: format = Regex(formatRE) Question 2: style_code := '/' | '!' | '_' Similar case, but different. I want patterns like: styled_text := style plain_text style where both style instances are identical. As the number of styles may grow (and even be impredictable: the style_code line will actually be written at runtime according to a config file) I don't want, and anyway can't, specify all possible kinds of styled_text. Even if possible, it would be ugly! pyparsing includes to methods to help you match the same text that was matched before - matchPreviousLiteral and matchPreviousExpr. Here is how your example would look: plain_text = Word(alphanums + " ") styled_text = style + plain_text + matchPreviousLiteral(style) (There is similar capability in regular expressions, too.) Question 3: I would like to specify a "side-condition" for a pattern, meaning that it should only when a specific token lies aside. For instance: A := A_pattern {X} X is not part of the pattern, thus should not be extracted. If X is just "garbage", I can write an enlarged pattern, then let it down later: A := A_pattern A_X := A X I think you might be looking for some kind of lookahead. In pyparsing, this is supported using the FollowedBy class. A_pattern = Word(alphas) X = Literal(".") A = A_pattern + FollowedBy(X).leaveWhitespace() print A.searchString("alskd sldjf sldfj. slfdj . slfjd slfkj.") prints [['sldfj'], ['slfkj']] _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor