On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga<tiagosab...@gmail.com> wrote:
> Hi!
>
> I am trying to split some lists out of a single text file, and I am
> having a hard time. I have reduced the problem to the following one:
>
> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I want to take out strings where all words start
> with a, end with "b.". But I don't want a list of words. I want that:
>
> ["a2345b.", "a45453b. a325643b. a435643b."]
>
> And I feel I still don't fully understand regular expression's logic. I
> do not understand the results below:
>
> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
> Out[33]: 'a45453b. a325643b. '

group(0) is the entire match so this returns what you expect. But what
is group(1)?

In [6]: re.search("(a[^.]*?b\.\s?){2}", text).group(1)
Out[6]: 'a325643b. '

Repeated groups are tricky; the returned value contains only the first
match for the group, not the repeats.

> In [34]: re.findall("(a[^.]*?b\.\s?){2}", text)
> Out[34]: ['a325643b. ']

When the re contains groups, re.findall() returns the groups. It
doesn't return the whole match. So this is giving group(1), not
group(0). You can get the whole match by explicitly grouping it:

In [4]: re.findall("((a[^.]*?b\.\s?){2})", text)
Out[4]: [('a45453b. a325643b. ', 'a325643b. ')]

> In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0)
> Out[35]: 'a2345b. '

You only get the first match, so this is correct.

> In [36]: re.findall("(a[^.]*?b\.\s?)+", text)
> Out[36]: ['a2345b. ', 'a435643b. ']

This is finding both matches but the grouping has the same difficulty
as the previous findall(). This is closer:

In [7]: re.findall("((a[^.]*?b\.\s?)+)", text)
Out[7]: [('a2345b. ', 'a2345b. '), ('a45453b. a325643b. a435643b. ',
'a435643b. ')]

If you change the inner parentheses to be non-grouping then you get
pretty much what you want:

In [8]: re.findall("((?:a[^.]*?b\.\s?)+)", text)
Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. ']

Kent
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to