Re: [Tutor] RE Silliness

Kent Johnson Mon, 05 Jan 2009 08:46:28 -0800

On Mon, Jan 5, 2009 at 11:16 AM, Omer <jaggojaggo...@gmail.com> wrote:
> Bob, I tried your way.
>
>>>> import re
>>>> urlMask = r"http://[\w\Q./\?=\R]+(<br>)?"
>>>> text=u"Not working example<br>http://this.is.a/url?header=null<br>And
>>>> another line<br>http://and.another.url";
>>>> re.findall(urlMask,text)
> [u'<br>', u'']
>
> spir, I did understand it. What I'm not understanding is why isn't this
> working.


There is a bit of a gotcha in re.findall() - its behaviour changes
depending on whether there are groups in the re. If the re contains
groups, re.findall() only returns the matches for the groups.

If you enclose the entire re in parentheses (making it a group) you
get a better result:
In [2]: urlMask = r"(http://[\w\Q./\?=\R]+(<br>)?)"

In [3]: text=u"Not working
example<br>http://this.is.a/url?header=null<br>And another
line<br>http://and.another.url";

In [4]: re.findall(urlMask,text)
Out[4]:
[(u'http://this.is.a/url?header=null<br>', u'<br>'),
 (u'http://and.another.url', u'')]

You can also use non-grouping parentheses around the <br>:
In [5]: urlMask = r"http://[\w\Q./\?=\R]+(?:<br>)?"

In [6]: re.findall(urlMask,text)
Out[6]: [u'http://this.is.a/url?header=null<br>', u'http://and.another.url']

Kent
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] RE Silliness

Reply via email to