Re: HTML Parsing and Indexing

2006-11-16 Thread Paul McGuire
On Nov 13, 1:12 pm, [EMAIL PROTECTED] wrote: I need a help on HTML parser. snip I saw a couple of python parsers like pyparsing, yappy, yapps, etc but they havn't given any example for HTML parsing. Geez, how hard did you look? pyparsing's wiki menu includes an 'Examples' link, which take

HTML Parsing and Indexing

2006-11-13 Thread mailtogops
Hi All, I am involved in one project which tends to collect news information published on selected, known web sites inthe format of HTML, RSS, etc and sortlist them and create a bookmark on our website for the news content(we will use django for web development). Currently this project is

Re: HTML Parsing and Indexing

2006-11-13 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: I need a help on HTML parser. http://www.effbot.org/pyfaq/tutor-how-do-i-get-data-out-of-html.htm /F -- http://mail.python.org/mailman/listinfo/python-list

Re: HTML Parsing and Indexing

2006-11-13 Thread Bernard
a combination of urllib, urlib2 and BeautifulSoup should do it. Read BeautifulSoup's documentation to know how to browse through the DOM. [EMAIL PROTECTED] a écrit : Hi All, I am involved in one project which tends to collect news information published on selected, known web sites inthe

Re: HTML Parsing and Indexing

2006-11-13 Thread Andy Dingley
[EMAIL PROTECTED] wrote: I am involved in one project which tends to collect news information published on selected, known web sites inthe format of HTML, RSS, etc I just can't imagine why anyone would still want to do this. With RSS, it's an easy (if not trivial) problem. With HTML

Re: HTML Parsing and Indexing

2006-11-13 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: I am involved in one project which tends to collect news information published on selected, known web sites inthe format of HTML, RSS, etc and sortlist them and create a bookmark on our website for the news content(we will use django for web development). Currently