Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-04 Thread Yingjie Lan
--- On Wed, 11/3/10, MRAB wrote: > [snip] > The outer group is repeated, so it can match again, but the > inner group > can't match again because it captured all it could the > previous time. > > Therefore the outer group matches and captures an empty > string and the > inner group remembers its

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-03 Thread MRAB
On 03/11/2010 06:32, Yingjie Lan wrote: --- On Wed, 11/3/10, John Bond wrote: I just explained that (I think!)! The outer capturing group uses repetition, so it returns the last thing that was matched by the inner group, which was an empty string. According to yourself, the last match of the

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-03 Thread MRAB
On 03/11/2010 04:10, John Bond wrote: On 3/11/2010 4:02 AM, MRAB wrote: On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. How

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
hen the last match for the inner group must also be empty. Regards, Yingjie --- On Wed, 11/3/10, John Bond wrote: > From: John Bond > Subject: Re: Must be a bug in the re module [was: Why this result with the re > module] > To: python-list@python.org > Date: Wednesday, November

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, John Bond wrote: >3) then said there must be >=0 occurrences of what's inside it, >which of course there is, so that has no effect. > >((.a.)*)* Hi, I think there should be a difference: unlike before, now what's inside the outer group can match an empty s

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 4:23 AM, Yingjie Lan wrote: --- On Wed, 11/3/10, MRAB wrote: From: MRAB Subject: Re: Must be a bug in the re module [was: Why this result with the re module] To: python-list@python.org Date: Wednesday, November 3, 2010, 8:02 AM On 03/11/2010 03:42, Yingjie Lan wrote: Therefore

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
OK, I've got that, and I have no problem with the capturing part. My real problem is with the number of total matches. *I think it should be 4 matches in total but findall gives 6 matches*, for the new regex '((.a.)*)*'. I'd love to know what you think about this. Many thanks! Yingjie We'v

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, MRAB wrote: > From: MRAB > Subject: Re: Must be a bug in the re module [was: Why this result with the re > module] > To: python-list@python.org > Date: Wednesday, November 3, 2010, 8:02 AM > On 03/11/2010 03:42, Yingjie Lan > wrote: > Therefore th

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, John Bond wrote: > Just to clarify - findall is returning: > > [ (only match in outer group, 1st match in inner group) > , (only match in outer group, 2nd match in inner group) > , (only match in outer group, 3rd match in inner group) > , (only match in outer group, 4th mat

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 4:02 AM, MRAB wrote: On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ?

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 3:55 AM, John Bond wrote: Could you please reconsider how would you work with this new one and see if my steps are correct? If you agree with my 7-step execution for the new regex, then: We finally found a real bug for re.findall: re.findall('((.a.)*)*', 'Mary has a lamb') [(''

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread MRAB
On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ? Actually, in hindsight, my explan

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
Could you please reconsider how would you work with this new one and see if my steps are correct? If you agree with my 7-step execution for the new regex, then: We finally found a real bug for re.findall: re.findall('((.a.)*)*', 'Mary has a lamb') [('', 'Mar'), ('', ''), ('', ''), ('', 'lam'

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> Matches an empty string, returns '' > > The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ? -- http://mail.python.org/mailman/listinfo/python-list

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> Your regex says "Zero or more consecutive occurrences of > something, always returning the most possible".  That's > what it does, at every position - only matching emptyness > where it couldn't match anything (findall then skips a > character to avoid overlapping/infinite empty > matches),  and

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread MRAB
On 03/11/2010 01:41, Yingjie Lan wrote: From: John Bond Subject: Re: Why this result with the re module To: "Yingjie Lan" Cc: python-list@python.org Date: Tuesday, November 2, 2010, 8:09 PM On 2/11/2010 12:19 PM, Yingjie Lan wrote: From: John Bond Subject: Re: Why this result with the re module

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Chris Melville
Disagree in this case, where the whole regex matches an empty string. Greadiness will match as much as possible. So it will also match the empty strings between consecutive characters as much as possible, once we have properly defined all the unique empty strings. Because of greadiness, fewer ma

Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> From: John Bond > Subject: Re: Why this result with the re module > To: "Yingjie Lan" > Cc: python-list@python.org > Date: Tuesday, November 2, 2010, 8:09 PM > On 2/11/2010 12:19 PM, Yingjie Lan > wrote: > >> From: John Bond > >> Subject: Re: Why this result with the re module > > Firstly, than