[Tutor] RegEx query

Liam Clarke Sat, 17 Dec 2005 00:00:50 -0800

Hi all,

Using Beautiful Soup and regexes.. I've noticed that all the examples
used regexes like so - anchors = parseTree.fetch("a",
{"href":re.compile("pattern")} )  instead of precompiling the pattern.


Myself, I have the following code -
>>> z = []
>>> x = q.findNext("a", {"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})

>>> while x:
...     num = x.findNext("td", "tableColA")
...     h = (x.contents[0],x.attrMap["href"],num.contents[0])
...     z.append(h)
...     x = x.findNext("a",{"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})
...

This gives me a correct set of results. However, using the following -

>>> z = []
>>> pattern = re.compile(".*?thread/[0-9]*?/.*", re.IGNORECASE)
>>> x = q.findNext("a", {"href":pattern)})

>>> while x:
...     num = x.findNext("td", "tableColA")
...     h = (x.contents[0],x.attrMap["href"],num.contents[0])
...     z.append(h)
...     x = x.findNext("a",{"href":pattern} )

will only return the first found tag.

Is the regex only evaluated once or similar?

(Also any pointers on how to get negative lookahead matching working
would be great.
the regex (/thread/[0-9]*)(?!\/) still matches "/thread/28606/" and
I'd assumed it wouldn't.

Regards,

Liam Clarke
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] RegEx query

Reply via email to