Hi! I am trying to split some lists out of a single text file, and I am having a hard time. I have reduced the problem to the following one:
text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." Of this line of text, I want to take out strings where all words start with a, end with "b.". But I don't want a list of words. I want that: ["a2345b.", "a45453b. a325643b. a435643b."] And I feel I still don't fully understand regular expression's logic. I do not understand the results below: In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0) Out[33]: 'a45453b. a325643b. ' In [34]: re.findall("(a[^.]*?b\.\s?){2}", text) Out[34]: ['a325643b. '] In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0) Out[35]: 'a2345b. ' In [36]: re.findall("(a[^.]*?b\.\s?)+", text) Out[36]: ['a2345b. ', 'a435643b. '] What's the difference between search and findall in [33-34]? And why I cannot generalize [33] to [35]? Out[35] would make sense to me if I had put a non-greedy +, but why do re gets only one word? Thanks, Tiago Saboga. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor