[EMAIL PROTECTED] wrote: > Hi, > > I have a load of files I need to process. Each line of a file looks > something like this: > > eYAL001C1 Spar 81 3419 4518 4519 2 1 > > So basically its a table, separated with tabs. What I need to do is make a > new file where all the entries in the table are those where the values in > columns 1 and 5 were present as a pair more than once in the original file. > > I really have very little idea how to achiev this. So far I read in the > file to a list , where each item in the list is a list of the entries on a > line.
I would do this with two passes over the data. The first pass would accumulate lines and count pairs of (col1, col5); the second pass would output the lines whose count is > 1. Something like this (untested): lines = [] counts = {} # Build a list of split lines and count the (col1, col5) pairs for line in open('input.txt'): line = line.split() # break line on tabs key = (line[1], line[5]) # or (line[0], line[4]) depending on what you mean by col 1 counts[key] = counts.get(key, 0) + 1 # count the key pair lines.append(line) # Output the lines whose pairs appear more than once f = open('output.txt', 'w') for line in lines: if counts[(line[1], line[5])] > 1: f.write('\t'.join(line)) f.write('\n') f.close() Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor