I'll just "Me Too" on Alan's Advice. I had a similar sized project only it was binary data in an ISAM file instead of flat ASCII. I tried several "pure" python methods and all took forever. Finally I used Python to read-modify-input source data into a mysql database. Then I pulled the data out via python and wrote it to a new ISAM file. The whole thing took longer to code that way but boy it sure scaled MUCH better and was much quicker in the end.
John Purser -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alan Gauld Sent: Tuesday, January 25, 2005 05:09 To: Scott Melnyk; tutor@python.org Subject: Re: [Tutor] sorting a 2 gb file > My data set the below is taken from is over 2.4 gb so speed and memory > considerations come into play. To be honest, if this were my problem, I'd proably dump all the data into a database and use SQL to extract what I needed. Thats a much more effective tool for this kind of thing. You can do it with Python, but I think we need more understanding of the problem. For example what the various fields represent, how much of a comparison (ie which fields, case sensitivity etc) leads to "equality" etc. Alan G. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor