Arild B. Næss wrote: > Hi, > > I'm working on a python script for a task in statistical language > processing. Briefly put it all boils down to counting different > things in very large text files, doing simple computations on these > counts and storing the results. I have been using python's dictionary > type as my basic data structure of storing the counts. This has been > a nice and simple solution, but turns out to be a bad idea in the > long run, since the dictionaries become _very_ large, and create > MemoryErrors when I try to run my script on texts of a certain size. > > It seems that an SQL database would probably be the way to go, but I > am a bit concerned about speed issues (even though running time is > not all that crucial here). In any case it would probably take me a > while to get a database up and running and I need to hand in some > preliminary results pretty soon, so for now I think I'll postpone the > SQL and try to tweak my current script to be able to run it on > slightly longer texts than it can handle now. > > So, enough beating around the bush, my questions are: > > - Will the dictionaries take up less memory if I use numbers rather > than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory > than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so: > Slightly less, or substantially less memory?
I'm going to guess here. I think the number will take up 4 bytes plus the overhead of an object and the string will take about the number of bytes in the string plus the same overhead. But I am guessing and there are optimizations in the Python interpreter for both strings and ints that may affect this. > > - What are common methods to monitor the memory usage of a script? > Can I add a snippet to the code that prints out how many MBs of > memory a certain dictionary takes up at that particular time? See various discussions on comp.lang.python: http://tinyurl.com/ysrocc Kent > > regards, > Arild Næss > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor