----- Original Message ----- > From: Andreas Perstinger <andiper...@gmail.com> > To: tutor@python.org > Cc: > Sent: Thursday, June 13, 2013 8:09 PM > Subject: Re: [Tutor] regex grouping/capturing > > On 13.06.2013 17:09, Albert-Jan Roskam wrote: >> I have a string of the form "required optional3 optional2 optional1 >> optional3" ('optional' may be any kind of string, so it's > not simply >> 'optional\d+'. >> I would like to use a regex so I can distinguish groups. Desired >> outcome: ('required', 'optional3', 'optional2', > 'optional1', >> 'optional3'). Below is a fragment of the many things I have tried. > [SNIP] >> How can I make this work? > > If you really want to use a regex: >>>> import re >>>> s = "required optional3 optional2 optional1 optional3" >>>> s2 = "required optional1 optional2 optional3" >>>> pattern = "required|optional1|optional2|optional3" >>>> re.findall(pattern, s) > ['required', 'optional3', 'optional2', > 'optional1', 'optional3'] >>>> re.findall(pattern, s2) > ['required', 'optional1', 'optional2', > 'optional3']
Hi Andreas, thanks for your reply. I am trying to create a pygments regex lexer. It parses code and classfies it (in my case) commands, subcommands and keywords. AFAIK, re.findall can't be used with pygments, but maybe I am mistaken. The quantifier of groups (a plus sign in my case) just works different from what I expect. It seems that only optional (with a "?") groups can be used, not other quantifiers. Here's a simplfied example of the 'set' command that I would like to parse. >>> s = 'set workspace = 6148 header on.' >>> r = "(set)\s+(header|workspace)+\s*=?\s*.*\.$" >>> re.search(r, s, re.I).groups() [('set', 'workspace')] # desired output: [('set', 'workspace', 'header')] >>> r = "(set)\s+(?:(header|workspace)\s*=?\s*.*)+\.$" >>> re.search(r, s, re.I).groups() ('set', 'workspace') # grrr, still no luck _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor