I'm relatively new at Python and I'm trying to write a function that fills a
dictionary acording the following rules and (example) data:
Rules:
* No duplicate values in field1
* No duplicates values in field2 and field3 simultaneous (highest value in
field4 has to be preserved)
Rec.no field1, field2, field3, field4
1. abc, def123, ghi123, 120 <-- new, insert in dictionary
2. abc, def123, ghi123, 120 <-- duplicate with 1. field4 same value. Do not
insert in dictionary
3. bcd, def123, jkl125, 154 <-- new, insert in dictionary
4. efg, def123, jkl125, 175 <-- duplicate with 3 in field 2 and 3, but higher
value in field4. Remove 3. from dict and replace with 4.
5. hij, ghi345, jkl125, 175 <-- duplicate field3, but not in field4. New,
insert in dict.
The resulting dictionary should be:
hij {'F2': ' ghi345', 'F3': ' jkl125', 'F4': 175}
abc {'F2': ' def123', 'F3': ' ghi123', 'F4': 120}
efg {'F2': ' def123', 'F3': ' jkl125', 'F4': 175}
This is wat I came up with up to now, but there is something wrong with it. The
'bcd' should have been removed. When I run it it says:
bcd {'F2': ' def123', 'F3': ' jkl125', 'F4': 154}
hij {'F2': ' ghi345', 'F3': ' jkl125', 'F4': 175}
abc {'F2': ' def123', 'F3': ' ghi123', 'F4': 120}
efg {'F2': ' def123', 'F3': ' jkl125', 'F4': 175}
Below is wat I brew (simplified). It took me some time to figure out that I was
looking at the wrong values the wrong dictionary. I started again, but am
ending up with a lot of dictionaries and for x in y-loops. I think there is a
simpler way to do this.
Can somebody point me in the right direction and explain to me how to do this?
(and maybe have an alternative for the nesting. Because I may need to compare
more fields. This is only a simplified dataset).
######### not working
def createResults(field1, field2, field3, field4):
#check if field1 exists.
if not results.has_key(field1):
if results.has_key(field2):
#check if field2 already exists
if results.has_key(field3):
#check if field3 already exists
#retrieve value field4
existing_field4 = results[field2][F4]
#retrieve value existing field1 in dict
existing_field1 = results[field1]
#perform highest value check
if int(existing_field4) < int(field4):
#remove existing record from dict.
del results[existing_field1]
values = {}
values['F2'] = field2
values['F3'] = field3
values['F4'] = field4
results[field1] = values
else:
pass
else:
pass
else:
values = {}
values['F2'] = field2
values['F3'] = field3
values['F4'] = field4
results[field1] = values
else:
pass
for line in open("file.csv"):
field1, field2, field3, field4 = line.split(',')
createResults(field1, field2, field3, int(field4))
#because this is quick and dirty I had to get rid of the \n in the csv
for i in results.keys():
print i, '\t', results[i]
################
contents file.csv
abc, def123, ghi123, 120
abc, def123, ghi123, 120
bcd, def123, jkl125, 154
efg, def123, jkl125, 175
hij, ghi345, jkl125, 175
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor