On 18/05/13 22:44, Peter Otten wrote:
You can use a tool like lxml that "understands" html (though in this case
you'd need a javascript parser on top of that) -- or hack something together
with string methods or regular expressions. For example:
import urllib2
import json
s = urllib2.urlopen("http://*********/goldencasket").read()
s = s.partition("latestResults_productResults")[2].lstrip(" =")
s = s.partition(";")[0]
data = json.loads(s)
lotto = data["GoldLottoSaturday"]
print lotto["drawDayDateNumber"]
print map(int, lotto["primaryNumbers"])
print map(int, lotto["secondaryNumbers"])
While this is brittle I've found that doing it "right" is usually not
worthwhile as it won't survive the next website redesign eighter.
PS: <http://*********/goldencasket/results/download-results>
has links to zipped csv files with the results. Downloading, inflating and
reading these should be the simplest and best way to get your data.
Thanks again Peter and Walter,
The results download link points to a historical file of past results
although the latest results are included at the bottom of the file. The
file is quite large and it's zipped so I imagine unzipping would another
problem. I've come across Beautiful Soup and it may also offer a simple
solution.
Thanks for your response Walter, I'd like to download the Australian
Lotto results and there isn't a simple way, as far as I can see, to do
this. I'll read up on curl, maybe I can use it.
I'll experiment with the Peter's code and Beautiful Soup and see what I
can come up with. Maybe unzipping the file could be the best solution,
I'll experiment with that option as well.
--
Regards,
Phil
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor