Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-04 Thread Yingjie Lan
--- On Wed, 11/3/10, MRAB wrote: > [snip] > The outer group is repeated, so it can match again, but the > inner group > can't match again because it captured all it could the > previous time. > > Therefore the outer group matches and captures an empty > string and the > inner group remembers its

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-03 Thread MRAB
On 03/11/2010 06:32, Yingjie Lan wrote: --- On Wed, 11/3/10, John Bond wrote: I just explained that (I think!)! The outer capturing group uses repetition, so it returns the last thing that was matched by the inner group, which was an empty string. According to yourself, the last match of the

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-03 Thread MRAB
On 03/11/2010 04:10, John Bond wrote: On 3/11/2010 4:02 AM, MRAB wrote: On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. How

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
hen the last match for the inner group must also be empty. Regards, Yingjie --- On Wed, 11/3/10, John Bond wrote: > From: John Bond > Subject: Re: Must be a bug in the re module [was: Why this result with the re > module] > To: python-list@python.org > Date: Wednesday, November

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, John Bond wrote: >3) then said there must be >=0 occurrences of what's inside it, >which of course there is, so that has no effect. > >((.a.)*)* Hi, I think there should be a difference: unlike before, now what's inside the outer group can match an empty s

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 4:23 AM, Yingjie Lan wrote: --- On Wed, 11/3/10, MRAB wrote: From: MRAB Subject: Re: Must be a bug in the re module [was: Why this result with the re module] To: python-list@python.org Date: Wednesday, November 3, 2010, 8:02 AM On 03/11/2010 03:42, Yingjie Lan wrote: Therefore

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
OK, I've got that, and I have no problem with the capturing part. My real problem is with the number of total matches. *I think it should be 4 matches in total but findall gives 6 matches*, for the new regex '((.a.)*)*'. I'd love to know what you think about this. Many thanks! Yingjie We'v

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, MRAB wrote: > From: MRAB > Subject: Re: Must be a bug in the re module [was: Why this result with the re > module] > To: python-list@python.org > Date: Wednesday, November 3, 2010, 8:02 AM > On 03/11/2010 03:42, Yingjie Lan > wrote: > Therefore th

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
--- On Wed, 11/3/10, John Bond wrote: > Just to clarify - findall is returning: > > [ (only match in outer group, 1st match in inner group) > , (only match in outer group, 2nd match in inner group) > , (only match in outer group, 3rd match in inner group) > , (only match in outer group, 4th mat

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 4:02 AM, MRAB wrote: On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ?

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
On 3/11/2010 3:55 AM, John Bond wrote: Could you please reconsider how would you work with this new one and see if my steps are correct? If you agree with my 7-step execution for the new regex, then: We finally found a real bug for re.findall: re.findall('((.a.)*)*', 'Mary has a lamb') [(''

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread MRAB
On 03/11/2010 03:42, Yingjie Lan wrote: Matches an empty string, returns '' The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ? Actually, in hindsight, my explan

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread John Bond
Could you please reconsider how would you work with this new one and see if my steps are correct? If you agree with my 7-step execution for the new regex, then: We finally found a real bug for re.findall: re.findall('((.a.)*)*', 'Mary has a lamb') [('', 'Mar'), ('', ''), ('', ''), ('', 'lam'

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> Matches an empty string, returns '' > > The result is therefore ['Mar', '', '', 'lam', '', ''] Thanks, now I see it through with clarity. Both you and JB are right about this case. However, what if the regex is ((.a.)*)* ? -- http://mail.python.org/mailman/listinfo/python-list

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> Your regex says "Zero or more consecutive occurrences of > something, always returning the most possible".  That's > what it does, at every position - only matching emptyness > where it couldn't match anything (findall then skips a > character to avoid overlapping/infinite empty > matches),  and

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread MRAB
On 03/11/2010 01:41, Yingjie Lan wrote: From: John Bond Subject: Re: Why this result with the re module To: "Yingjie Lan" Cc: python-list@python.org Date: Tuesday, November 2, 2010, 8:09 PM On 2/11/2010 12:19 PM, Yingjie Lan wrote: From: John Bond Subject: Re: Why this result with the

Re: Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Chris Melville
Disagree in this case, where the whole regex matches an empty string. Greadiness will match as much as possible. So it will also match the empty strings between consecutive characters as much as possible, once we have properly defined all the unique empty strings. Because of greadiness, fewer ma

Must be a bug in the re module [was: Why this result with the re module]

2010-11-02 Thread Yingjie Lan
> From: John Bond > Subject: Re: Why this result with the re module > To: "Yingjie Lan" > Cc: python-list@python.org > Date: Tuesday, November 2, 2010, 8:09 PM > On 2/11/2010 12:19 PM, Yingjie Lan > wrote: > >> From: John Bond > >> Subject: R

Re: Why this result with the re module

2010-11-02 Thread John Bond
On 2/11/2010 12:19 PM, Yingjie Lan wrote: From: John Bond Subject: Re: Why this result with the re module Firstly, thanks a lot for your patient explanation. this time I have understood all your points perfectly. Secondly, I'd like to clarify some of my points, which did not get th

Re: Why this result with the re module

2010-11-02 Thread Yingjie Lan
> From: Vlastimil Brom > Subject: Re: Why this result with the re module > in that case you may use re.finditer(...) Thanks for pointing this out. Still I'd love to see re.findall never discards the whole match, even if a tuple is returned. Yingjie -- http://m

Re: Why this result with the re module

2010-11-02 Thread Vlastimil Brom
2010/11/2 Yingjie Lan : >> From: John Bond >> Subject: Re: Why this result with the re module > ... > I suggested findall return a tuple of re.MatchObject(s), > with each MatchObject instance representing a match. > This is consistent with the re.match() function anyway.

Re: Why this result with the re module

2010-11-02 Thread Yingjie Lan
> From: John Bond > Subject: Re: Why this result with the re module Firstly, thanks a lot for your patient explanation. this time I have understood all your points perfectly. Secondly, I'd like to clarify some of my points, which did not get through because of my poor prese

Re: Why this result with the re module

2010-11-02 Thread John Bond
On 2/11/2010 8:53 AM, Yingjie Lan wrote: BUT, but. 1. I expected findall to find matches of the whole regex '(.a.)+', not just the subgroup (.a.) from re.findall('(.a.)+', 'Mary has a lamb') Thus it is probably a misunderstanding/bug?? Again, as soon as you put a capturing group in your exp

Re: Why this result with the re module

2010-11-02 Thread Yingjie Lan
> From: John Bond > You might wonder why something that can match no input > text, doesn't return an infinite number of those matches at > every possible position, but they would be overlapping, and > findall explicitly says matches have to be non-overlapping. That scrabbed my itches, though the

Re: Why this result with the re module

2010-11-02 Thread John Bond
On 2/11/2010 7:00 AM, Yingjie Lan wrote: re.findall('(.a.)*',' ') #two spaces ['', '', ''] I must need more details of the matching algorithm to explain this? Regards, Yingjie Sorry - I hit enter prematurely on my last message. To take the above as an example (all your examples boil dow

Re: Why this result with the re module

2010-11-02 Thread Yingjie Lan
> From: John Bond > Subject: Re: Why this result with the re module > >>>> re.findall('(.a.)*', 'Mary has a lamb') > > ['Mar', '', '', 'lam', '', ''] > So - see if you can explain the

Re: Why this result with the re module

2010-11-02 Thread Yingjie Lan
> From: John Bond > re.findall('(.a.)+', 'Mary has a lamb') > > ['Mar', 'lam'] > It's because you're using capturing groups, and because of > how they work - specifically they only return the LAST match > if used with repetition (and multiple matches occur). It seems capturing groups is ass

Re: Why this result with the re module

2010-11-01 Thread John Bond
On 2/11/2010 4:31 AM, Yingjie Lan wrote: Hi, I am rather confused by these results below. I am not a re expert at all. the module version of re is 2.2.1 with python 3.1.2 import re re.findall('.a.', 'Mary has a lamb') #OK ['Mar', 'has', ' a ', 'lam'] re.findall('(.a.)*', 'Mary has a lamb') #?

Why this result with the re module

2010-11-01 Thread Yingjie Lan
Hi, I am rather confused by these results below. I am not a re expert at all. the module version of re is 2.2.1 with python 3.1.2 >>> import re >>> re.findall('.a.', 'Mary has a lamb') #OK ['Mar', 'has', ' a ', 'lam'] >>> re.findall('(.a.)*', 'Mary has a lamb') #?? ['Mar', '', '', 'lam', '', ''