Adam Monsen wrote:
> The following script is a high-performance link ( href="...">...) extractor. [...]
> * extract links from text (most likey valid HTML)
[...]
> import re
> import urllib
>
> whiteout = re.compile(r'\s+')
>
> # grabs hyperlinks from text
> href_re = re.compile(r'''
>
pretty nice, however, u wont capture the more and more common
javascripted redirections, like
click me
nor
http://www.yahoo.com";>
nor
http://www.yahoo.com"; name=x>
.
im guessing it also wont handle correctly thing like:
click
but you probably already knew all this stuff, didnt
The following script is a high-performance link (...) extractor. I'm posting to this list in hopes that
anyone interested will offer constructive
criticism/suggestions/comments/etc. Mainly I'm curious what comments
folks have on my regular expressions. Hopefully someone finds this
kind of thing as