On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga<tiagosab...@gmail.com> wrote: > Hi! > > I am trying to split some lists out of a single text file, and I am > having a hard time. I have reduced the problem to the following one: > > text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." > > Of this line of text, I want to take out strings where all words start > with a, end with "b.". But I don't want a list of words. I want that: > > ["a2345b.", "a45453b. a325643b. a435643b."] > > And I feel I still don't fully understand regular expression's logic. I > do not understand the results below: > > In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0) > Out[33]: 'a45453b. a325643b. '
group(0) is the entire match so this returns what you expect. But what is group(1)? In [6]: re.search("(a[^.]*?b\.\s?){2}", text).group(1) Out[6]: 'a325643b. ' Repeated groups are tricky; the returned value contains only the first match for the group, not the repeats. > In [34]: re.findall("(a[^.]*?b\.\s?){2}", text) > Out[34]: ['a325643b. '] When the re contains groups, re.findall() returns the groups. It doesn't return the whole match. So this is giving group(1), not group(0). You can get the whole match by explicitly grouping it: In [4]: re.findall("((a[^.]*?b\.\s?){2})", text) Out[4]: [('a45453b. a325643b. ', 'a325643b. ')] > In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0) > Out[35]: 'a2345b. ' You only get the first match, so this is correct. > In [36]: re.findall("(a[^.]*?b\.\s?)+", text) > Out[36]: ['a2345b. ', 'a435643b. '] This is finding both matches but the grouping has the same difficulty as the previous findall(). This is closer: In [7]: re.findall("((a[^.]*?b\.\s?)+)", text) Out[7]: [('a2345b. ', 'a2345b. '), ('a45453b. a325643b. a435643b. ', 'a435643b. ')] If you change the inner parentheses to be non-grouping then you get pretty much what you want: In [8]: re.findall("((?:a[^.]*?b\.\s?)+)", text) Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. '] Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor