> The posts basocally say go and look at the HTML and find the
> right tags for the data you need. This is fubndamental to any kind of web
> scraping, you need to understand the HTML tree well enough to identify
> where yourt data exists.
>
> How familiar are you with HTML and its structures?
Reasonably familiar.
> Can you view the source in your browser and identify the heirarchy
> of tags to the place where your data lives?
I can view the source, and have made my own web pages (HTML, CSS).
I am less sure about the hierarchy of tags. For example, here is the
section around the current temperature:
<div class="blueBox">
<div id="curcondbox">
<div class="subG b">West of Town, Jamestown, Pennsylvania
(PWS)</div>
<div class="bm10">Updated: <span class="pwsrt"
pwsid="KPAJAMES1" pwsunit="english" pwsvariable="lu" value="1247814018">3:00 AM
EDT on July 17, 2009</span></div>
<table cellspacing="0" cellpadding="0" class="full">
<tr>
<td class="vaT full">
<table cellspacing="0" cellpadding="5" class="full">
<tr>
<td class="vaM taC"><img
src="http://icons-pe.wxug.com/i/c/a/nt_clear.gif" width="42" height="42"
alt="Clear" class="condIcon" /></td>
<td class="vaM taC full">
<div style="font-size: 17px;"><span class="pwsrt"
pwsid="KPAJAMES1" pwsunit="english" pwsvariable="tempf" english="°F"
metric="°C" value="60.3">
<span class="nobr"><span class="b">60.3</span> °F</span>
</span></div>
The 60.3 is the value I want to extract. It appears to be down within a
hierarchy
something like:
<body
<div class="blueBox">
<div id="curcondbox">
<table
<table
<div>
<span class="nobr">
<span class="b">
But I am far from sure I got all that right; it is not easy to
look at HTML and match <div> with </div>. Unless I am missing
something? Do I have to use all of the above in my Beautiful
Soup?
CM
_________________________________________________________________
Windows Live™ SkyDrive™: Get 25 GB of free online storage.
http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor