On Tue, Oct 16, 2012 at 12:57 PM, Abhishek Pratap <abhishek....@gmail.com> wrote: > Hi Guys > > For my problem I need to store 400-800 million 20 characters keys in a > dictionary and do counting. This data structure takes about 60-100 Gb > of RAM. > I am wondering if there are slick ways to map the dictionary to a file > on disk and not store it in memory but still access it as dictionary > object. Speed is not the main concern in this problem and persistence > is not needed as the counting will only be done once on the data. We > want the script to run on smaller memory machines if possible. > > I did think about databases for this but intuitively it looks like a > overkill coz for each key you have to first check whether it is > already present and increase the count by 1 and if not then insert > the key into dbase. > > Just want to take your opinion on this. > > Thanks! > -Abhi > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor
My inexperienced advice would be to begin with the storage areas available. I would begin by eliminating certain things such as: x = {'one_entry' : 1} into x = {'one_entry':1} To map, you would want maybe different db files that contain certain info within a certain range. 0-1000 entries in the first file, etc. os.walk a directory, and find the mapped file in a particular range file, then go straight to the entry needed. Make the dict one long line, and you could eliminate any /n newline chars. I could do better with more time, but that seems like a good solution at this point. Best Regards, David Hutto CEO: http://www.hitwebdevelopment.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor