Re: [Tutor] html scrapeing

Bob Gailer Thu, 30 Jun 2005 17:50:48 -0700

At 10:36 AM 6/26/2005, Nathan Hughes wrote:

Ive been looking for way to scrape the data from a html table, but dont know even where to start, or how to do..

an example can be found here of the table ( http://www.dragon256.plus.com/timer.html ) - i'd like to extract all the data except for the delete column and then just print each row..

Use module urllib2 for obtaining the page source:

import urllib2
page = urllib2.urlopen("http://www.dragon256.plus.com/timer.html")
html = page.readlines()

You now have a list of lines.

Now you can use any number of string parsing tools to locate lines starting with <tr> to find each new row, then <td> to find each cell, then search past the tag(s) to find the cell text.
You have 3 cases to deal with:

<td class='normal' align='left'><a href=''>Glastonbury 2005</a></td>


<td class='normal' align='left'>BBC THREE</td>

<td class='normal' align='middle'><input type='checkbox' ></td>

Is that enough to get you started?

Bob Gailer
mailto:[EMAIL PROTECTED]
510 558 3275 home
720 938 2625 cell

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] html scrapeing

Reply via email to