Danny Yoo wrote: > There are several out there; one that comes standard in Python 3 is > the "dbm" module: > > https://docs.python.org/3.5/library/dbm.html > > Instead of doing: > > diz5 = {} > ... > > we'd do something like this: > > with diz5 = dbm.open('diz5, 'c'): > ... > > And otherwise, your code will look very similar! This dictionary-like > object will store its data on disk, rather than in-memory, so that it > can grow fairly large. The other nice thing is that you can do the > dbm creation up front. If you run your program again, you might add a > bit of logic to *reuse* the dbm that's already on disk, so that you > don't have to process your input files all over again.
dbm operates on byte strings for both keys and values, so there are a few changes. Fortunately there's a wrapper around dbm called shelve that uses string keys and allows objects that can be pickled as values: https://docs.python.org/dev/library/shelve.html With that your code may become with shelve.open("diz5") as db: with open("tmp1.txt") as instream: for line in instream: assert line.count("\t") == 1 key, _tab, value = line.rstrip("\n").partition("\t") values = db.get(key) or set() values.add(value) db[key] = values Note that while shelve has a setdefault() method it will only work as expected when you set writeback=True which in turn may require arbitrary amounts of memory. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor