Hi all,

Using Beautiful Soup and regexes.. I've noticed that all the examples
used regexes like so - anchors = parseTree.fetch("a",
{"href":re.compile("pattern")} )  instead of precompiling the pattern.

Myself, I have the following code -
>>> z = []
>>> x = q.findNext("a", {"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})

>>> while x:
...     num = x.findNext("td", "tableColA")
...     h = (x.contents[0],x.attrMap["href"],num.contents[0])
...     z.append(h)
...     x = x.findNext("a",{"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})
...

This gives me a correct set of results. However, using the following -

>>> z = []
>>> pattern = re.compile(".*?thread/[0-9]*?/.*", re.IGNORECASE)
>>> x = q.findNext("a", {"href":pattern)})

>>> while x:
...     num = x.findNext("td", "tableColA")
...     h = (x.contents[0],x.attrMap["href"],num.contents[0])
...     z.append(h)
...     x = x.findNext("a",{"href":pattern} )

will only return the first found tag.

Is the regex only evaluated once or similar?

(Also any pointers on how to get negative lookahead matching working
would be great.
the regex (/thread/[0-9]*)(?!\/) still matches "/thread/28606/" and
I'd assumed it wouldn't.

Regards,

Liam Clarke
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to