Re: [Tutor] Retrieving data from a web site

Phil Sat, 18 May 2013 17:06:30 -0700

On 18/05/13 22:44, Peter Otten wrote:

You can use a tool like lxml that "understands" html (though in this case
you'd need a javascript parser on top of that) -- or hack something together
with string methods or regular expressions. For example:


import urllib2
import json

s = urllib2.urlopen("http://*********/goldencasket";).read()
s = s.partition("latestResults_productResults")[2].lstrip(" =")
s = s.partition(";")[0]
data = json.loads(s)
lotto = data["GoldLottoSaturday"]

print lotto["drawDayDateNumber"]
print map(int, lotto["primaryNumbers"])
print map(int, lotto["secondaryNumbers"])

While this is brittle I've found that doing it "right" is usually not
worthwhile as it won't survive the next website redesign eighter.

PS: <http://*********/goldencasket/results/download-results>
has links to zipped csv files with the results. Downloading, inflating and
reading these should be the simplest and best way to get your data.


Thanks again Peter and Walter,

The results download link points to a historical file of past resultsalthough the latest results are included at the bottom of the file. Thefile is quite large and it's zipped so I imagine unzipping would anotherproblem. I've come across Beautiful Soup and it may also offer a simplesolution.

Thanks for your response Walter, I'd like to download the AustralianLotto results and there isn't a simple way, as far as I can see, to dothis. I'll read up on curl, maybe I can use it.

I'll experiment with the Peter's code and Beautiful Soup and see what Ican come up with. Maybe unzipping the file could be the best solution,I'll experiment with that option as well.


--
Regards,
Phil
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Retrieving data from a web site

Reply via email to