Re: [Tutor] sorting a 2 gb file- i shrunk it and turned it around

2005-01-26 Thread Kent Johnson
My guess is that your file is small enough that Danny's two-pass approach will work. You might even be able to do it in one pass. If you have enough RAM, here is a sketch of a one-pass solution: # This will map each result to a list of queries that contain that result results= {} # Iterate the fi

Re: [Tutor] sorting a 2 gb file

2005-01-25 Thread Danny Yoo
On Tue, 25 Jan 2005, Max Noel wrote: > >> My data set the below is taken from is over 2.4 gb so speed and > >> memory considerations come into play. > >> > >> Are sets more effective than lists for this? > > > > Sets or dictionaries make the act of "lookup" of a key fairly cheap. > > In the two-

Re: [Tutor] sorting a 2 gb file

2005-01-25 Thread Max Noel
On Jan 25, 2005, at 23:40, Danny Yoo wrote: In pseudocode, this will look something like: ### hints = identifyDuplicateRecords(filename) displayDuplicateRecords(filename, hints) ### My data set the below is taken from is over 2.4 gb so speed and memory considerations come into play. Are sets more

Re: [Tutor] sorting a 2 gb file

2005-01-25 Thread Danny Yoo
On Tue, 25 Jan 2005, Scott Melnyk wrote: > I have an file in the form shown at the end (please forgive any > wrapparounds due to the width of the screen here- the lines starting > with ENS end with the e-12 or what have you on same line.) > > What I would like is to generate an output file of

RE: [Tutor] sorting a 2 gb file- i shrunk it and turned it around

2005-01-25 Thread Scott Melnyk
Thanks for the thoughts so far. After posting I have been thinking about how to pare down the file (much of the info in the big file was not relevant to this question at hand). After the first couple of responses I was even more motivated to shrink the file so not have to set up a db. This test w

Re: [Tutor] sorting a 2 gb file

2005-01-25 Thread Andrew D. Fant
Alan Gauld wrote: My data set the below is taken from is over 2.4 gb so speed and memory considerations come into play. To be honest, if this were my problem, I'd proably dump all the data into a database and use SQL to extract what I needed. Thats a much more effective tool for this kind of thing

RE: [Tutor] sorting a 2 gb file

2005-01-25 Thread John Purser
auld Sent: Tuesday, January 25, 2005 05:09 To: Scott Melnyk; tutor@python.org Subject: Re: [Tutor] sorting a 2 gb file > My data set the below is taken from is over 2.4 gb so speed and memory > considerations come into play. To be honest, if this were my problem, I'd proably dump al

Re: [Tutor] sorting a 2 gb file

2005-01-25 Thread Alan Gauld
> My data set the below is taken from is over 2.4 gb so speed and memory > considerations come into play. To be honest, if this were my problem, I'd proably dump all the data into a database and use SQL to extract what I needed. Thats a much more effective tool for this kind of thing. You can do

[Tutor] sorting a 2 gb file

2005-01-25 Thread Scott Melnyk
Hello! I am wondering about the best way to handle sorting some data from some of my results. I have an file in the form shown at the end (please forgive any wrapparounds due to the width of the screen here- the lines starting with ENS end with the e-12 or what have you on same line.) What I w