On Mon, Jan 5, 2009 at 11:16 AM, Omer <jaggojaggo...@gmail.com> wrote: > Bob, I tried your way. > >>>> import re >>>> urlMask = r"http://[\w\Q./\?=\R]+(<br>)?" >>>> text=u"Not working example<br>http://this.is.a/url?header=null<br>And >>>> another line<br>http://and.another.url" >>>> re.findall(urlMask,text) > [u'<br>', u''] > > spir, I did understand it. What I'm not understanding is why isn't this > working.
There is a bit of a gotcha in re.findall() - its behaviour changes depending on whether there are groups in the re. If the re contains groups, re.findall() only returns the matches for the groups. If you enclose the entire re in parentheses (making it a group) you get a better result: In [2]: urlMask = r"(http://[\w\Q./\?=\R]+(<br>)?)" In [3]: text=u"Not working example<br>http://this.is.a/url?header=null<br>And another line<br>http://and.another.url" In [4]: re.findall(urlMask,text) Out[4]: [(u'http://this.is.a/url?header=null<br>', u'<br>'), (u'http://and.another.url', u'')] You can also use non-grouping parentheses around the <br>: In [5]: urlMask = r"http://[\w\Q./\?=\R]+(?:<br>)?" In [6]: re.findall(urlMask,text) Out[6]: [u'http://this.is.a/url?header=null<br>', u'http://and.another.url'] Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor