Alan Gauld wrote:
My data set the below is taken from is over 2.4 gb so speed and

memory

considerations come into play.


To be honest, if this were my problem, I'd proably dump all the data
into a database and use SQL to extract what I needed. Thats a much
more effective tool for this kind of thing.

You can do it with Python, but I think we need more understanding
of the problem. For example what the various fields represent, how
much of a comparison (ie which fields, case sensitivity etc) leads
to "equality" etc.
>
And if the idea of setting up a full-blown SQL server for the problem seems like a lot of work, you might try prototyping the sort and solutions with sqlite, and only migrate to (full-fledged RDBMS of your choice) if the prototype works as you want it too and sqlite seems too slow for your needs.


Andy
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to