Phil wrote: > On 18/05/13 19:25, Peter Otten wrote: >> >> Are there alternatives that give the number as plain text? > > Further investigation shows that the numbers are available if I view the > source of the page. So, all I have to do is parse the page and extract > the drawn numbers. I'm not sure, at the moment, how I might do that but > I have something to work with.
You can use a tool like lxml that "understands" html (though in this case you'd need a javascript parser on top of that) -- or hack something together with string methods or regular expressions. For example: import urllib2 import json s = urllib2.urlopen("http://*********/goldencasket").read() s = s.partition("latestResults_productResults")[2].lstrip(" =") s = s.partition(";")[0] data = json.loads(s) lotto = data["GoldLottoSaturday"] print lotto["drawDayDateNumber"] print map(int, lotto["primaryNumbers"]) print map(int, lotto["secondaryNumbers"]) While this is brittle I've found that doing it "right" is usually not worthwhile as it won't survive the next website redesign eighter. PS: <http://*********/goldencasket/results/download-results> has links to zipped csv files with the results. Downloading, inflating and reading these should be the simplest and best way to get your data. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor