Re: HTML parsing/scraping & python

2005-12-09 Thread alex_f_il
Take a look at SW Explorer Automation (http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA). SWEA creates an object model (automation interface) for any Web application running in Internet Explorer. It supports all IE functionality:frames, java script, dialogs, downloads. The runtime can a

Re: HTML parsing/scraping & python

2005-12-04 Thread gene tani
John J. Lee wrote: > Sanjay Arora <[EMAIL PROTECTED]> writes: > > > We are looking to select the language & toolset more suitable for a > > project that requires getting data from several web-sites in real- > > timehtml parsing/scraping. It would require full emulation of the > > browser, incl

Re: HTML parsing/scraping & python

2005-12-04 Thread John J. Lee
Sanjay Arora <[EMAIL PROTECTED]> writes: > We are looking to select the language & toolset more suitable for a > project that requires getting data from several web-sites in real- > timehtml parsing/scraping. It would require full emulation of the > browser, including handling cookies, automat

Re: HTML parsing/scraping & python

2005-12-01 Thread Mike Meyer
"Fuzzyman" <[EMAIL PROTECTED]> writes: > The standard library module for fetching HTML is urllib2. Does urllib2 replace everything in urllib? I thought there was some urllib functionality that urllib2 didn't do. > There is a project called mechanize, built by John Lee on top of > urllib2 and othe

Re: HTML parsing/scraping & python

2005-12-01 Thread Fuzzyman
The standard library module for fetching HTML is urllib2. The best module for scraping the HTML is BeautifulSoup. There is a project called mechanize, built by John Lee on top of urllib2 and other standard modules. It will emulate a browsers behaviour - including history, cookies, basic authenti

Re: HTML parsing/scraping & python

2005-11-30 Thread Mike Meyer
Sanjay Arora <[EMAIL PROTECTED]> writes: > We are looking to select the language & toolset more suitable for a > project that requires getting data from several web-sites in real- > timehtml parsing/scraping. It would require full emulation of the > browser, including handling cookies, automat

HTML parsing/scraping & python

2005-11-30 Thread Sanjay Arora
We are looking to select the language & toolset more suitable for a project that requires getting data from several web-sites in real- timehtml parsing/scraping. It would require full emulation of the browser, including handling cookies, automated logins & following multiple web-link paths. Mul