My guess is that your file is small enough that Danny's two-pass approach will work. You might even
be able to do it in one pass.
If you have enough RAM, here is a sketch of a one-pass solution:
# This will map each result to a list of queries that contain that result
results= {}
# Iterate the fi
On Tue, 25 Jan 2005, Max Noel wrote:
> >> My data set the below is taken from is over 2.4 gb so speed and
> >> memory considerations come into play.
> >>
> >> Are sets more effective than lists for this?
> >
> > Sets or dictionaries make the act of "lookup" of a key fairly cheap.
> > In the two-
On Jan 25, 2005, at 23:40, Danny Yoo wrote:
In pseudocode, this will look something like:
###
hints = identifyDuplicateRecords(filename)
displayDuplicateRecords(filename, hints)
###
My data set the below is taken from is over 2.4 gb so speed and memory
considerations come into play.
Are sets more
On Tue, 25 Jan 2005, Scott Melnyk wrote:
> I have an file in the form shown at the end (please forgive any
> wrapparounds due to the width of the screen here- the lines starting
> with ENS end with the e-12 or what have you on same line.)
>
> What I would like is to generate an output file of
Thanks for the thoughts so far. After posting I have been thinking
about how to pare down the file (much of the info in the big file was
not relevant to this question at hand).
After the first couple of responses I was even more motivated to
shrink the file so not have to set up a db. This test w
Alan Gauld wrote:
My data set the below is taken from is over 2.4 gb so speed and
memory
considerations come into play.
To be honest, if this were my problem, I'd proably dump all the data
into a database and use SQL to extract what I needed. Thats a much
more effective tool for this kind of thing
auld
Sent: Tuesday, January 25, 2005 05:09
To: Scott Melnyk; tutor@python.org
Subject: Re: [Tutor] sorting a 2 gb file
> My data set the below is taken from is over 2.4 gb so speed and
memory
> considerations come into play.
To be honest, if this were my problem, I'd proably dump al
> My data set the below is taken from is over 2.4 gb so speed and
memory
> considerations come into play.
To be honest, if this were my problem, I'd proably dump all the data
into a database and use SQL to extract what I needed. Thats a much
more effective tool for this kind of thing.
You can do
Hello!
I am wondering about the best way to handle sorting some data from
some of my results.
I have an file in the form shown at the end (please forgive any
wrapparounds due to the width of the screen here- the lines starting
with ENS end with the e-12 or what have you on same line.)
What I w