--- On Wed, 11/3/10, MRAB wrote:
> [snip]
> The outer group is repeated, so it can match again, but the
> inner group
> can't match again because it captured all it could the
> previous time.
>
> Therefore the outer group matches and captures an empty
> string and the
> inner group remembers its
On 03/11/2010 06:32, Yingjie Lan wrote:
--- On Wed, 11/3/10, John Bond wrote:
I just explained that (I think!)! The outer capturing group
uses
repetition, so it returns the last thing that was matched
by the inner
group, which was an empty string.
According to yourself, the last match of the
On 03/11/2010 04:10, John Bond wrote:
On 3/11/2010 4:02 AM, MRAB wrote:
On 03/11/2010 03:42, Yingjie Lan wrote:
Matches an empty string, returns ''
The result is therefore ['Mar', '', '', 'lam', '', '']
Thanks, now I see it through with clarity.
Both you and JB are right about this case.
How
hen the last
match for the inner group must
also be empty.
Regards,
Yingjie
--- On Wed, 11/3/10, John Bond wrote:
> From: John Bond
> Subject: Re: Must be a bug in the re module [was: Why this result with the re
> module]
> To: python-list@python.org
> Date: Wednesday, November
--- On Wed, 11/3/10, John Bond wrote:
>3) then said there must be >=0 occurrences of what's inside it,
>which of course there is, so that has no effect.
>
>((.a.)*)*
Hi,
I think there should be a difference: unlike before,
now what's inside the outer group can match an empty
s
On 3/11/2010 4:23 AM, Yingjie Lan wrote:
--- On Wed, 11/3/10, MRAB wrote:
From: MRAB
Subject: Re: Must be a bug in the re module [was: Why this result with the re
module]
To: python-list@python.org
Date: Wednesday, November 3, 2010, 8:02 AM
On 03/11/2010 03:42, Yingjie Lan
wrote:
Therefore
OK, I've got that, and I have no problem with the capturing part.
My real problem is with the number of total matches.
*I think it should be 4 matches in total but findall
gives 6 matches*, for the new regex '((.a.)*)*'.
I'd love to know what you think about this.
Many thanks!
Yingjie
We'v
--- On Wed, 11/3/10, MRAB wrote:
> From: MRAB
> Subject: Re: Must be a bug in the re module [was: Why this result with the re
> module]
> To: python-list@python.org
> Date: Wednesday, November 3, 2010, 8:02 AM
> On 03/11/2010 03:42, Yingjie Lan
> wrote:
> Therefore th
--- On Wed, 11/3/10, John Bond wrote:
> Just to clarify - findall is returning:
>
> [ (only match in outer group, 1st match in inner group)
> , (only match in outer group, 2nd match in inner group)
> , (only match in outer group, 3rd match in inner group)
> , (only match in outer group, 4th mat
On 3/11/2010 4:02 AM, MRAB wrote:
On 03/11/2010 03:42, Yingjie Lan wrote:
Matches an empty string, returns ''
The result is therefore ['Mar', '', '', 'lam', '', '']
Thanks, now I see it through with clarity.
Both you and JB are right about this case.
However, what if the regex is ((.a.)*)* ?
On 3/11/2010 3:55 AM, John Bond wrote:
Could you please reconsider how would you
work with this new one and see if my steps
are correct? If you agree with my 7-step
execution for the new regex, then:
We finally found a real bug for re.findall:
re.findall('((.a.)*)*', 'Mary has a lamb')
[(''
On 03/11/2010 03:42, Yingjie Lan wrote:
Matches an empty string, returns ''
The result is therefore ['Mar', '', '', 'lam', '', '']
Thanks, now I see it through with clarity.
Both you and JB are right about this case.
However, what if the regex is ((.a.)*)* ?
Actually, in hindsight, my explan
Could you please reconsider how would you
work with this new one and see if my steps
are correct? If you agree with my 7-step
execution for the new regex, then:
We finally found a real bug for re.findall:
re.findall('((.a.)*)*', 'Mary has a lamb')
[('', 'Mar'), ('', ''), ('', ''), ('', 'lam'
> Matches an empty string, returns ''
>
> The result is therefore ['Mar', '', '', 'lam', '', '']
Thanks, now I see it through with clarity.
Both you and JB are right about this case.
However, what if the regex is ((.a.)*)* ?
--
http://mail.python.org/mailman/listinfo/python-list
> Your regex says "Zero or more consecutive occurrences of
> something, always returning the most possible". That's
> what it does, at every position - only matching emptyness
> where it couldn't match anything (findall then skips a
> character to avoid overlapping/infinite empty
> matches), and
On 03/11/2010 01:41, Yingjie Lan wrote:
From: John Bond
Subject: Re: Why this result with the re module
To: "Yingjie Lan"
Cc: python-list@python.org
Date: Tuesday, November 2, 2010, 8:09 PM
On 2/11/2010 12:19 PM, Yingjie Lan
wrote:
From: John Bond
Subject: Re: Why this result with the
Disagree in this case, where the whole regex
matches an empty string. Greadiness will match
as much as possible. So it will also match
the empty strings between consecutive
characters as much as possible, once
we have properly defined all the unique
empty strings. Because of greadiness,
fewer ma
> From: John Bond
> Subject: Re: Why this result with the re module
> To: "Yingjie Lan"
> Cc: python-list@python.org
> Date: Tuesday, November 2, 2010, 8:09 PM
> On 2/11/2010 12:19 PM, Yingjie Lan
> wrote:
> >> From: John Bond
> >> Subject: R
On 2/11/2010 12:19 PM, Yingjie Lan wrote:
From: John Bond
Subject: Re: Why this result with the re module
Firstly, thanks a lot for your patient explanation.
this time I have understood all your points perfectly.
Secondly, I'd like to clarify some of my points, which
did not get th
> From: Vlastimil Brom
> Subject: Re: Why this result with the re module
> in that case you may use re.finditer(...)
Thanks for pointing this out.
Still I'd love to see re.findall never
discards the whole match, even if
a tuple is returned.
Yingjie
--
http://m
2010/11/2 Yingjie Lan :
>> From: John Bond
>> Subject: Re: Why this result with the re module
> ...
> I suggested findall return a tuple of re.MatchObject(s),
> with each MatchObject instance representing a match.
> This is consistent with the re.match() function anyway.
> From: John Bond
> Subject: Re: Why this result with the re module
Firstly, thanks a lot for your patient explanation.
this time I have understood all your points perfectly.
Secondly, I'd like to clarify some of my points, which
did not get through because of my poor prese
On 2/11/2010 8:53 AM, Yingjie Lan wrote:
BUT, but.
1. I expected findall to find matches of the whole
regex '(.a.)+', not just the subgroup (.a.) from
re.findall('(.a.)+', 'Mary has a lamb')
Thus it is probably a misunderstanding/bug??
Again, as soon as you put a capturing group in your exp
> From: John Bond
> You might wonder why something that can match no input
> text, doesn't return an infinite number of those matches at
> every possible position, but they would be overlapping, and
> findall explicitly says matches have to be non-overlapping.
That scrabbed my itches, though the
On 2/11/2010 7:00 AM, Yingjie Lan wrote:
re.findall('(.a.)*',' ') #two spaces
['', '', '']
I must need more details of the matching algorithm to explain this?
Regards,
Yingjie
Sorry - I hit enter prematurely on my last message.
To take the above as an example (all your examples boil dow
> From: John Bond
> Subject: Re: Why this result with the re module
> >>>> re.findall('(.a.)*', 'Mary has a lamb')
> > ['Mar', '', '', 'lam', '', '']
> So - see if you can explain the
> From: John Bond
> re.findall('(.a.)+', 'Mary has a lamb')
> > ['Mar', 'lam']
> It's because you're using capturing groups, and because of
> how they work - specifically they only return the LAST match
> if used with repetition (and multiple matches occur).
It seems capturing groups is ass
On 2/11/2010 4:31 AM, Yingjie Lan wrote:
Hi, I am rather confused by these results below.
I am not a re expert at all. the module version
of re is 2.2.1 with python 3.1.2
import re
re.findall('.a.', 'Mary has a lamb') #OK
['Mar', 'has', ' a ', 'lam']
re.findall('(.a.)*', 'Mary has a lamb') #?
Hi, I am rather confused by these results below.
I am not a re expert at all. the module version
of re is 2.2.1 with python 3.1.2
>>> import re
>>> re.findall('.a.', 'Mary has a lamb') #OK
['Mar', 'has', ' a ', 'lam']
>>> re.findall('(.a.)*', 'Mary has a lamb') #??
['Mar', '', '', 'lam', '', ''
29 matches
Mail list logo