Dear list, I have been trying to understand out how to use iterators and in particular groupby statements. I am, however, quite lost.
I wish to subset the below list, selecting the observations that have an ID ('realtime_start') value that is greater than some date (i've used the variable name maxDate), and in the case that there is more than one such record, returning only the one that has the largest ID ('realtime_start'). The code below does the job, however i have the impression that it might be done in a more python way using iterators and groupby statements. could someone please help me understand how to go from this code to the pythonic idiom? thanks in advance, Matt Johnson _________________ ## Code example import pprint obs = [{'date': '2012-09-01', 'realtime_end': '2013-02-18', 'realtime_start': '2012-10-15', 'value': '231.951'}, {'date': '2012-09-01', 'realtime_end': '2013-02-18', 'realtime_start': '2012-11-15', 'value': '231.881'}, {'date': '2012-10-01', 'realtime_end': '2013-02-18', 'realtime_start': '2012-11-15', 'value': '231.751'}, {'date': '2012-10-01', 'realtime_end': '9999-12-31', 'realtime_start': '2012-12-19', 'value': '231.623'}, {'date': '2013-02-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-03-21', 'value': '231.157'}, {'date': '2012-11-01', 'realtime_end': '2013-02-18', 'realtime_start': '2012-12-14', 'value': '231.025'}, {'date': '2012-11-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-01-19', 'value': '231.071'}, {'date': '2012-12-01', 'realtime_end': '2013-02-18', 'realtime_start': '2013-01-16', 'value': '230.979'}, {'date': '2012-12-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-02-19', 'value': '231.137'}, {'date': '2012-12-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-03-19', 'value': '231.197'}, {'date': '2013-01-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-02-21', 'value': '231.198'}, {'date': '2013-01-01', 'realtime_end': '9999-12-31', 'realtime_start': '2013-03-21', 'value': '231.222'}] maxDate = "2013-03-21" dobs = dict([(d, []) for d in set([e['date'] for e in obs])]) for o in obs: dobs[o['date']].append(o) dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate]) for k, v in dobs.items()]) rts = lambda x: x['realtime_start'] mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e] mmax.sort(key = lambda x: x['date']) pprint.pprint(mmax) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor