[EMAIL PROTECTED] wrote:
> Hi,
> 
> I have a load of files I need to process. Each line of a file looks 
> something like this:
> 
> eYAL001C1     Spar    81      3419    4518    4519    2       1       
> 
> So basically its a table, separated with tabs. What I need to do is make a 
> new file where all the entries in the table are those where the values in 
> columns 1 and 5 were present as a pair more than once in the original file.
> 
> I really have very little idea how to achiev this. So far I read in the 
> file to a list , where each item in the list is a list of the entries on a 
> line.

I would do this with two passes over the data. The first pass would accumulate 
lines and count pairs 
of (col1, col5); the second pass would output the lines whose count is > 1. 
Something like this 
(untested):

lines = []
counts = {}

# Build a list of split lines and count the (col1, col5) pairs
for line in open('input.txt'):
   line = line.split()  # break line on tabs
   key = (line[1], line[5])  # or (line[0], line[4]) depending on what you mean 
by col 1
   counts[key] = counts.get(key, 0) + 1  # count the key pair
   lines.append(line)

# Output the lines whose pairs appear more than once
f = open('output.txt', 'w')
for line in lines:
   if counts[(line[1], line[5])] > 1:
     f.write('\t'.join(line))
     f.write('\n')
f.close()

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to