I'd use a database if I was you. Install for instance MYSQL or MudBase or something like that and (if need be use Python) to insert the lines into the database. Only storing unique lines would be failry easy.
Other sollution (with the usage of Python): If you must use Python I'd suggest making new smaller files. How about making files that are named with the 1st letter of each line you find and split your file up into as many parts and your lines start with unique characters. You end up with lots of smaller files that just need to be merged together (could be done with 'cat' I think?) or you could read all those files and toss them back into 1 big file. Something like this: Read the 2,5G masterfile Read lines 1 by 1 Make a new file named "1st char of line founnd".txt if it doesn't exist and add the new line otherwise scan this file and see if the line is not there yet and if not there add it. when done: merge all files. Can't be too hard to pull off ;) _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor