On 1 Jun 2005 [EMAIL PROTECTED] wrote: > eYAL001C1 Spar 81 3419 4518 4519 2 1 > > So basically its a table, separated with tabs. What I need to do is make > a new file where all the entries in the table are those where the values > in columns 1 and 5 were present as a pair more than once in the original > file.
This is half-baked, but I toss it out in case anyone can build on it. Create a dictionary, keyed on column 1. Read a line and split it into the columns. For each line, create a dictionary entry that is a dictionary keyed by column 5, whose entry is a list of lists, the inner list of which contains columns 2, 3, 4 and 6. When a dupe is found, add an additional inner list. So, upon processing this line, you have a dictionary D: {'eYAL001C1': {'4518': [['Spar', '3419', '4519', '2', '1']]}} As you process each new line, one of three things is true: 1) Col 1 is used as a key, but col5 is not used as an inner key; 2) Col 1 is used as a key, and col5 is used as an inner key 3) Col 1 is not used as a key So, for each new line: if col1 in d.keys(): if col5 in d[col1].keys() d[col1][col5].append([col2, col3, col4, col6]) else d[col1][col5] = [[col2, col3, col4, col6]] else: d[col1]={col5:[[col2, col3, col4, col6] The end result is that you'll have all your data from the file in the form of a dictionary indexed by column 1. Each entry in the top-level dictionary is a second-level dictionary indexed by column 2. Each entry in that second-level dictionary is a list of lists, and each list in that list of lists is columns 2, 3, 4 and 6. if the list of lists has a length of 1, then the col1/col5 combo only appears once in the input file. But if it has a length > 1, it occurred more than once, and satisfies you condition of "columns 1 and 5 were present as a pair more than once" So to get at these: for key1 in d: for key2 in d[key1]: if len(d[key1][key2]) > 1: for l in d[key1][key2]: print key1, l[0], l[1], l[2], key2, l[3] I haven't tested this approach (or syntax) but I think the approach is basically sound. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor