At 05:02 AM 8/2/2008, Kent Johnson wrote:
On Sat, Aug 2, 2008 at 6:07 AM, Dick Moores <[EMAIL PROTECTED]> wrote:
> I'm pretty new to Python's dictionaries, but I had a need for a function
> that would find the values in a dict that have more than one key each.

From your sample output it appears that you want not just the values,
but a list of (value, keys) pairs for which there are more than one
key. It is easy to just build a reverse dict and filter its items:

In [15]: from collections import defaultdict
In [16]: rev=defaultdict(list)
In [18]: for k, v in d.iteritems():
    rev[v].append(k)

In [20]: [ [v, keys] for v, keys in rev.iteritems() if len(keys) > 1 ]

Out[20]:
[[1, ['a', 'e', 'g']],
 [2, ['b', 'f', 'i', 'h']],
 [4, ['d', 'j']],
 ['U.S. Senator', ['John McCain', 'Barack Obama']],
 [56, [45, 55]]]

This also has the advantage of making only two passes over the data so
its performance should be O(n). Your solution and Andre's make one or
more passes over the data for each data element so they will have
O(n*n) performance meaning for a large dict they could be very slow.

Wow, the genius' genius appears!

You're using some Python tools I didn't know about. More study!

I made a function of your code and time tested it against mine, using as d that dict of colors at <http://py77.python.pastebin.com/f796752ff>.

Yours is 73 times faster!

In [12]: run -t -N10 fcn_values_dupe_keys.py
Time was 0.297 seconds
Time was 0.328 seconds
Time was 0.344 seconds
Time was 0.328 seconds
Time was 0.313 seconds
Time was 0.344 seconds
Time was 0.39 seconds
Time was 0.297 seconds
Time was 0.312 seconds
Time was 0.297 seconds

IPython CPU timings (estimated):
Total runs performed: 10
  Times :      Total       Per run
  User  : 3.3092253345 s, 0.33092253345 s.
  System:        0.0 s,        0.0 s.

In [13]: run -t -N10 kent1.py
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0.015 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds

IPython CPU timings (estimated):
Total runs performed: 10
  Times :      Total       Per run
  User  : 0.044969961266 s, 0.0044969961266 s.
  System:        0.0 s,        0.0 s.

BTW Kent, I'm going to take this opportunity to ask you about "System" in the IPython timing results. It's always zero for the code I time. What's an example of code that would have System be greater than zero? And what's the distinction between User and System? (I'm using Win XP, if that's relevant.)

Thanks,

Dick

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to