Re: optimizing large dictionaries

2009-01-16 Thread Luis M . González
On Jan 15, 6:39 pm, Per Freem wrote: > hello > > i have an optimization questions about python. i am iterating through > a file and counting the number of repeated elements. the file has on > the order > of tens of millions elements... > > i create a dictionary that maps elements of the file that

Re: optimizing large dictionaries

2009-01-16 Thread Matthias Julius
Per Freem writes: > the only 'twist' is that my elt is an instance of a class (MyClass) > with 3 fields, all numeric. the class is hashable, and so > my_dict[elt] works well. the __repr__ and __hash__ methods of my > class simply return str() representation of self, which just calls __str__().

Re: optimizing large dictionaries

2009-01-16 Thread pruebauno
On Jan 15, 4:39 pm, Per Freem wrote: > hello > > i have an optimization questions about python. i am iterating through > a file and counting the number of repeated elements. the file has on > the order > of tens of millions elements... > > i create a dictionary that maps elements of the file that

Re: optimizing large dictionaries

2009-01-15 Thread Scott David Daniels
Paul Rubin wrote: Per Freem writes: 2] is there an easy way to have nested defaultdicts? ie i want to say that my_dict = defaultdict(defaultdict(int)) -- to reflect the fact that my_dict is a dictionary, whose values are dictionary that map to ints. but that syntax is not valid. my_dict = def

Re: optimizing large dictionaries

2009-01-15 Thread Paul McGuire
On Jan 15, 5:31 pm, Per Freem wrote: > ...the aKeys are very small (less than 100) where > as the bKeys are the ones that are in the millions.  so in that case, > doing a Try-Except on aKey should be very efficient, since often it > will not fail, ... Do you know the aKeys in advance? If so, the

Re: optimizing large dictionaries

2009-01-15 Thread Christian Heimes
Per Freem schrieb: > 1] is Try-Except really slower? my dict actually has two layers, so > my_dict[aKey][bKeys]. the aKeys are very small (less than 100) where > as the bKeys are the ones that are in the millions. so in that case, > doing a Try-Except on aKey should be very efficient, since often

Re: optimizing large dictionaries

2009-01-15 Thread Steven D'Aprano
On Thu, 15 Jan 2009 14:49:29 -0800, bearophileHUGS wrote: > Matimus, your suggestions are all good. > > Try-except is slower than: > if x in adict: ... else: ... Not according to my tests. >>> def tryexcept(D, key): ... try: ... return D[key] ... except KeyError: ...

Re: optimizing large dictionaries

2009-01-15 Thread Paul Rubin
Per Freem writes: > 2] is there an easy way to have nested defaultdicts? ie i want to say > that my_dict = defaultdict(defaultdict(int)) -- to reflect the fact > that my_dict is a dictionary, whose values are dictionary that map to > ints. but that syntax is not valid. my_dict = defaultdict(lambd

Re: optimizing large dictionaries

2009-01-15 Thread Per Freem
thanks to everyone for the excellent suggestions. a few follow up q's: 1] is Try-Except really slower? my dict actually has two layers, so my_dict[aKey][bKeys]. the aKeys are very small (less than 100) where as the bKeys are the ones that are in the millions. so in that case, doing a Try-Except o

Re: optimizing large dictionaries

2009-01-15 Thread Steven D'Aprano
On Thu, 15 Jan 2009 23:22:48 +0100, Christian Heimes wrote: >> is there anything that can be done to speed up this simply code? right >> now it is taking well over 15 minutes to process, on a 3 Ghz machine >> with lots of RAM (though this is all taking CPU power, not RAM at this >> point.) > > cl

Re: optimizing large dictionaries

2009-01-15 Thread bearophileHUGS
Matimus, your suggestions are all good. Try-except is slower than: if x in adict: ... else: ... A defaultdict is generally faster (there are some conditions when it's not faster, but they aren't much common. I think it's when the ratio of duplicates is really low), creating just a tuple instead of

Re: optimizing large dictionaries

2009-01-15 Thread Christian Heimes
> class MyClass > > def __str__(self): > return "%s-%s-%s" %(self.field1, self.field2, self.field3) > > def __repr__(self): > return str(self) > > def __hash__(self): > return hash(str(self)) > > > is there anything that can be done to speed up this simply code? right > now i

Re: optimizing large dictionaries

2009-01-15 Thread Jervis Whitley
On Fri, Jan 16, 2009 at 8:39 AM, Per Freem wrote: > hello > > i have an optimization questions about python. i am iterating through > a file and counting the number of repeated elements. the file has on > the order > of tens of millions elements... > > > for line in file: > try: >elt = MyCla

Re: optimizing large dictionaries

2009-01-15 Thread Matimus
On Jan 15, 1:39 pm, Per Freem wrote: > hello > > i have an optimization questions about python. i am iterating through > a file and counting the number of repeated elements. the file has on > the order > of tens of millions elements... > > i create a dictionary that maps elements of the file that

optimizing large dictionaries

2009-01-15 Thread Per Freem
hello i have an optimization questions about python. i am iterating through a file and counting the number of repeated elements. the file has on the order of tens of millions elements... i create a dictionary that maps elements of the file that i want to count to their number of occurs. so i iter