> My data set the below is taken from is over 2.4 gb so speed and memory > considerations come into play.
To be honest, if this were my problem, I'd proably dump all the data into a database and use SQL to extract what I needed. Thats a much more effective tool for this kind of thing. You can do it with Python, but I think we need more understanding of the problem. For example what the various fields represent, how much of a comparison (ie which fields, case sensitivity etc) leads to "equality" etc. Alan G. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor