Re: HTML scraping in python

2009-06-11 Thread Lloyd Kvam
On Thu, 2009-06-11 at 12:25 -0400, Paul Lussier wrote: > Lloyd Kvam writes: > > > easy_install mechanize > > should simply do the right thing. If it does not, you're > > probably better off doing a distutils install: > > This is what all the docs said, however, I couldn't find

Re: HTML scraping in python

2009-06-11 Thread Paul Lussier
So, I have the tables from the page in a list. Taking a hint from Shawn's example, I can get this: (Pdb) tables[0].input I need to now parse this input tag into it's separate elements so I can get at 'name' and 'value'. Ooh, it appears I can do this: tables[0].input.get('name') tables[0].i

Re: HTML scraping in python

2009-06-11 Thread Paul Lussier
Lloyd Kvam writes: > easy_install mechanize > should simply do the right thing. If it does not, you're > probably better off doing a distutils install: This is what all the docs said, however, I couldn't find easy_install. It turns out that when I installed python, python creat

Re: HTML scraping in python

2009-06-11 Thread Lloyd Kvam
On Thu, 2009-06-11 at 08:59 -0400, Paul Lussier wrote: > However, mechanize seems dependant upon ClientForm, and I can't figure > out how to get the ClientForm*.egg installed. I placed it in > sys.path, but it's not getting picked up, I tried to manually test > that it would work using pkg_resourc

Re: HTML scraping in python

2009-06-11 Thread Paul Lussier
"Shawn O'Shea" writes: > There is. The BeautifulSoup docs/examples page has been invaluable to me Hmm, I didn't find that page quite as helpful as you seem to have. Perhaps I spend more time with it... > the past for learning BS. Anyway, here's an example that should help. > > $ python > Python

Re: HTML scraping in python

2009-06-11 Thread Shawn O'Shea
On Thu, Jun 11, 2009 at 10:20 AM, Paul Lussier wrote: > Paul Lussier writes: > > > I stumbled up BeautifulSoup and am now trying to get that and the > > mechanize module installed. > > Okay, I've got that installed. I've figured out enough BS to get me a > single row of the table into a list com

Re: HTML scraping in python

2009-06-11 Thread Paul Lussier
Paul Lussier writes: > I stumbled up BeautifulSoup and am now trying to get that and the > mechanize module installed. Okay, I've got that installed. I've figured out enough BS to get me a single row of the table into a list comprised of elements like: 'data' Now I just need to figure out how

Re: HTML scraping in python

2009-06-11 Thread Paul Lussier
Lloyd Kvam writes: > I assume you want a dict for each row. Yes, with the column headers as the keys. > I have not seen a table extract module. BeautifulSoup is a third party > module that is usually effective in dealing with any HTML. Hopefully > the table is reasonably simple with no colspa

Re: HTML scraping in python

2009-06-11 Thread Ben Scott
On Thu, Jun 11, 2009 at 7:21 AM, Paul Lussier wrote: > I would like to extract a table from an HTML document and break it > down to a dict for further processing. The PySIG group did something like this with the GNHLUG meeting history, which is maintained in tabular form. Perhaps it will be use

Re: HTML scraping in python

2009-06-11 Thread Lloyd Kvam
On Thu, 2009-06-11 at 07:21 -0400, Paul Lussier wrote: > Hi Folks, > > I would like to extract a table from an HTML document and break it > down to a dict for further processing. I assume you want a dict for each row. > I've googled around a bit and found about 4 different modules that do > ht

HTML scraping in python

2009-06-11 Thread Paul Lussier
Hi Folks, I would like to extract a table from an HTML document and break it down to a dict for further processing. I've googled around a bit and found about 4 different modules that do html processing, but nothing on dealing explicitly with tables (something like Perl's HTML::TableExtract modul