Thanks a lot for your replies. Using a dbm seems to be a very good solution in some cases.

But most of my dictionaries are nested, and since both keys and values in the dbm 'dictionaries' have to be strings, I can't immediately see how I could get it to work.


A bit more detail: I deal with conditional probabilities, with up to 4 parameters. These parameters are numbers or words and determine the value (which is always a number). E.g. I have a dictionary {p1:{p2: {p3:{p4:value}}}}, where the p's are different parameters. I sometimes need to sum over one or more of the parameters – for now I have managed to structure the dictionaries so that I only need to sum over the innermost parameter, although this has been a bit cumbersome.

regards,
Arild Næss


Videresendt melding:
Fra: " Arild B. Næss " <[EMAIL PROTECTED]>
Dato: 23. februar 2007 18.30.40 GMT+01:00
Til: tutor@python.org
Emne: [Tutor] dictionaries and memory handling
Delivered-To: [EMAIL PROTECTED]

Hi,

I'm working on a python script for a task in statistical language
processing. Briefly put it all boils down to counting different
things in very large text files, doing simple computations on these
counts and storing the results. I have been using python's dictionary
type as my basic data structure of storing the counts. This has been
a nice and simple solution, but turns out to be a bad idea in the
long run, since the dictionaries become _very_ large, and create
MemoryErrors when I try to run my script on texts of a certain size.

It seems that an SQL database would probably be the way to go, but I
am a bit concerned about speed issues (even though running time is
not all that crucial here). In any case it would probably take me a
while to get a database up and running and I need to hand in some
preliminary results pretty soon, so for now I think I'll postpone the
SQL and try to tweak my current script to be able to run it on
slightly longer texts than it can handle now.

So, enough beating around the bush, my questions are:

- Will the dictionaries take up less memory if I use numbers rather
than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory
than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so:
Slightly less, or substantially less memory?

- What are common methods to monitor the memory usage of a script?
Can I add a snippet to the code that prints out how many MBs of
memory a certain dictionary takes up at that particular time?

regards,
Arild Næss
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to