I am working on a reducer that needs to produce a sorted output of files sorted on their overall bandwidth use. I create a dictionary with the file name as the key (it is always unique) and in the values I am populating a list with the two values of bytes and bytes sent.
Each entry looks like {filename:[bytes, bytes_sent]} how would I sort on bytes sent? how would I make this more efficient? code: # Expect as input: # URI,1,return_code,bytes,referer,ip,time_taken,bytes_sent,ref_dom # index 0 1 2 3 4 5 6 7 8 import sys dict = {} def update_dict(filename, bytes, bytes_sent): # Build and update our dictionary adding total bytes sent. if dict.has_key(filename): bytes_sent += dict[filename][1] dict[filename] = [bytes, bytes_sent] else: dict[filename] = [bytes, bytes_sent] # input comes from STDIN for line in sys.stdin: # remove leading and trailing whitespace and split on tab words = line.rstrip().split('\t') file = words[0] bytes = words[3] bytes_sent = int(words[7]) update_dict(file, bytes, bytes_sent)
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor