Re: Output of HTML parsing

2007-06-19 Thread Jackie
On 6 15 , 2 01 , Stefan Behnel [EMAIL PROTECTED] wrote: Jackie wrote: import lxml.etree as et url = http://www.economics.utoronto.ca/index.php/index/person/faculty/; tree = et.parse(url) Stefan- - - - Thank you. But when I tried to run the above part, the following

Re: Output of HTML parsing

2007-06-19 Thread Stefan Behnel
Jackie schrieb: On 6 15 , 2 01 , Stefan Behnel [EMAIL PROTECTED] wrote: Jackie wrote: import lxml.etree as et url = http://www.economics.utoronto.ca/index.php/index/person/faculty/; tree = et.parse(url) Stefan- - - - Thank you. But when I tried to run the above

Output of html parsing

2007-06-16 Thread Jackie Wang
Hi, all, I want to get the information of the professors (name,title) from the following link: http://www.economics.utoronto.ca/index.php/index/person/faculty/; Ideally, I'd like to have a output file where each line is one Prof, including his name and title. In practice, I use

Output of HTML parsing

2007-06-15 Thread Jackie
Hi, all, I want to get the information of the professors (name,title) from the following link: http://www.economics.utoronto.ca/index.php/index/person/faculty/; Ideally, I'd like to have a output file where each line is one Prof, including his name and title. In practice, I use the CSV module.

Re: Output of HTML parsing

2007-06-15 Thread Sebastian Wiesner
[ Jackie [EMAIL PROTECTED] ] 1.The code above assume that each Prof has a tilte. If any one of them does not, the name and title will be mismatched. How to program to allow that title can be empty? 2.Is there any easier way to get the data I want other than using list? Use BeautifulSoup.

Re: Output of HTML parsing

2007-06-15 Thread Stefan Behnel
Jackie wrote: I want to get the information of the professors (name,title) from the following link: http://www.economics.utoronto.ca/index.php/index/person/faculty/; That's even XHTML, no need to go through BeautifulSoup. Use lxml instead. http://codespeak.net/lxml Ideally, I'd like to