>My data set the below is taken from is over 2.4 gb so speed and
memory
considerations come into play.
To be honest, if this were my problem, I'd proably dump all the data into a database and use SQL to extract what I needed. Thats a much more effective tool for this kind of thing.
You can do it with Python, but I think we need more understanding of the problem. For example what the various fields represent, how much of a comparison (ie which fields, case sensitivity etc) leads to "equality" etc.
And if the idea of setting up a full-blown SQL server for the problem seems like a lot of work, you might try prototyping the sort and solutions with sqlite, and only migrate to (full-fledged RDBMS of your choice) if the prototype works as you want it too and sqlite seems too slow for your needs.
Andy _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor