Dear list,

I have been trying to understand out how to use iterators and in
particular groupby statements.  I am, however, quite lost.

I wish to subset the below list, selecting the observations that have
an ID ('realtime_start') value that is greater than some date (i've
used the variable name maxDate), and in the case that there is more
than one such record, returning only the one that has the largest ID
('realtime_start').

The code below does the job, however i have the impression that it
might be done in a more python way using iterators and groupby
statements.

could someone please help me understand how to go from this code to
the pythonic idiom?

thanks in advance,

Matt Johnson

_________________

## Code example

import pprint

obs = [{'date': '2012-09-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-10-15',
  'value': '231.951'},
 {'date': '2012-09-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-11-15',
  'value': '231.881'},
 {'date': '2012-10-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-11-15',
  'value': '231.751'},
 {'date': '2012-10-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2012-12-19',
  'value': '231.623'},
 {'date': '2013-02-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-03-21',
  'value': '231.157'},
 {'date': '2012-11-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2012-12-14',
  'value': '231.025'},
 {'date': '2012-11-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-01-19',
  'value': '231.071'},
 {'date': '2012-12-01',
  'realtime_end': '2013-02-18',
  'realtime_start': '2013-01-16',
  'value': '230.979'},
 {'date': '2012-12-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-02-19',
  'value': '231.137'},
 {'date': '2012-12-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-03-19',
  'value': '231.197'},
 {'date': '2013-01-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-02-21',
  'value': '231.198'},
 {'date': '2013-01-01',
  'realtime_end': '9999-12-31',
  'realtime_start': '2013-03-21',
  'value': '231.222'}]

maxDate = "2013-03-21"

dobs = dict([(d, []) for d in set([e['date'] for e in obs])])

for o in obs:
    dobs[o['date']].append(o)

dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
                for k, v in dobs.items()])

rts = lambda x: x['realtime_start']

mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]

mmax.sort(key = lambda x: x['date'])

pprint.pprint(mmax)
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to