Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Stefan Behnel
Andrew Berg, 19.05.2011 02:39: On 2011.05.18 03:30 AM, Stefan Behnel wrote: Well, it pretty clearly states that on the PyPI page, but I also added it to the project home page now. lxml 2.3 works with any CPython version from 2.3 to 3.2. Thank you. I never would've looked at PyPI for info on a

Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Andrew Berg
On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: Actually, using regex wasn't so bad: import re import urllib.request url = 'http://x264.nl/x264/?dir=./64bit/8bit_depth' page =

Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Karim
On 05/19/2011 11:35 PM, Andrew Berg wrote: On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: Actually, using regex wasn't so bad: import re import urllib.request url =

Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Ethan Furman
Andrew Berg wrote: ElementTree doesn't seem to have been updated in a long time, so I'll assume it won't work with Python 3. I don't know how to use it, but you'll find ElementTree as xml.etree in Python 3. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list

Re: Trying to understand html.parser.HTMLParser

2011-05-18 Thread Stefan Behnel
Andrew Berg, 17.05.2011 03:05: lxml looks promising, but it doesn't say anywhere whether it'll work on Python 3 or not Well, it pretty clearly states that on the PyPI page, but I also added it to the project home page now. lxml 2.3 works with any CPython version from 2.3 to 3.2. Stefan --

Re: Trying to understand html.parser.HTMLParser

2011-05-18 Thread Andrew Berg
On 2011.05.18 03:30 AM, Stefan Behnel wrote: Well, it pretty clearly states that on the PyPI page, but I also added it to the project home page now. lxml 2.3 works with any CPython version from 2.3 to 3.2. Thank you. I never would've looked at PyPI for info on a project that has its own site.

Re: Trying to understand html.parser.HTMLParser

2011-05-17 Thread Karim
On 05/17/2011 03:05 AM, Andrew Berg wrote: On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: linksList = re.findall('a href=(.*?).*?/a',htmlSource) for link in linksList: print link I was

Re: Trying to understand html.parser.HTMLParser

2011-05-16 Thread Karim
On 05/16/2011 03:06 AM, David Robinow wrote: On Sun, May 15, 2011 at 4:45 PM, Andrew Bergbahamutzero8...@gmail.com wrote: I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this: import urllib.request import html.parser class

Re: Trying to understand html.parser.HTMLParser

2011-05-16 Thread Andrew Berg
On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: linksList = re.findall('a href=(.*?).*?/a',htmlSource) for link in linksList: print link I was afraid I might have to use regexes (mostly

Trying to understand html.parser.HTMLParser

2011-05-15 Thread Andrew Berg
I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this: import urllib.request import html.parser class MyHTMLParser(html.parser.HTMLParser): def handle_starttag(self, tag, attrs): if tag == 'a' and attrs:

Re: Trying to understand html.parser.HTMLParser

2011-05-15 Thread David Robinow
On Sun, May 15, 2011 at 4:45 PM, Andrew Berg bahamutzero8...@gmail.com wrote: I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this: import urllib.request import html.parser class MyHTMLParser(html.parser.HTMLParser):    def