Re: Optimizing size of very large dictionaries

2008-07-31 Thread Terry Reedy
Raymond Hettinger wrote: Background: I'm trying to identify duplicate records in very large text based transaction logs. I'm detecting duplicate records by creating a SHA1 checksum of each record and using this checksum as a dictionary key. This works great except for several files whose size

Re: Optimizing size of very large dictionaries

2008-07-31 Thread M.-A. Lemburg
On 2008-07-31 02:29, [EMAIL PROTECTED] wrote: Are there any techniques I can use to strip a dictionary data structure down to the smallest memory overhead possible? I'm working on a project where my available RAM is limited to 2G and I would like to use very large dictionaries vs. a traditional

Re: Optimizing size of very large dictionaries

2008-07-31 Thread Raymond Hettinger
> > Are there any techniques I can use to strip a dictionary data > > structure down to the smallest memory overhead possible? Sure. You can build your own version of a dict using UserDict.DictMixin. The underlying structure can be as space efficient as you want. FWIW, dictionaries automaticall

Re: Optimizing size of very large dictionaries

2008-07-30 Thread Miles
On Wed, Jul 30, 2008 at 8:29 PM, <[EMAIL PROTECTED]> wrote: > Background: I'm trying to identify duplicate records in very large text > based transaction logs. I'm detecting duplicate records by creating a SHA1 > checksum of each record and using this checksum as a dictionary key. This > works gre

Re: Optimizing size of very large dictionaries

2008-07-30 Thread Gabriel Genellina
En Wed, 30 Jul 2008 21:29:39 -0300, <[EMAIL PROTECTED]> escribi�: Are there any techniques I can use to strip a dictionary data structure down to the smallest memory overhead possible? I'm working on a project where my available RAM is limited to 2G and I would like to use very large dictionari

Optimizing size of very large dictionaries

2008-07-30 Thread python
Are there any techniques I can use to strip a dictionary data structure down to the smallest memory overhead possible? I'm working on a project where my available RAM is limited to 2G and I would like to use very large dictionaries vs. a traditional database. Background: I'm trying to identify du