On Tue, 31 Aug 2010 12:44:08 am Knacktus wrote: > Hey everyone, > > I have a huge number of data items coming from a database.
Huge? Later in this thread, you mentioned 200,000 items overall. That might be "huge" to you, but it isn't to Python. Here's an example: class K(object): def __init__(self): self.info = {"id": id(self), "name": "root " + str(id(self)), "children_ids": [2*id(self), 3*id(self)+1]} And the size: >>> k = K() >>> sys.getsizeof(k) 28 >>> sys.getsizeof(k.info) 136 >>> L = [K() for _ in xrange(200000)] >>> sys.getsizeof(L) 835896 The sizes given are in bytes. So 200,000 instances of this class, plus the list to hold them, would take approximately 34 megabytes. An entry level PC these days has 1000 megabytes of memory. "Huge"? Not even close. Optimizing with __slots__ is premature. Perhaps if you had 1000 times that many instances, then it might be worth while. > So far > there're no restrictions about how to model the items. They can be > dicts, objects of a custom class (preferable with __slots__) or > namedTuple. > > Those items have references to each other using ids. That approach sounds slow and ponderous to me. Why don't you just give items direct references to each other, instead of indirect using ids? I presume you're doing something like this: ids = {0: None} # Map IDs to objects. a = Part(0) ids[1] = a b = Part(1) # b is linked to a via its ID 1. ids[2] = b c = Part(2) # c is linked to b via its ID 2. ids[3] = c (only presumably less painfully). If that's what you're doing, you should dump the ids and just do this: a = Part(None) b = Part(a) c = Part(b) Storing references to objects in Python is cheap -- it's only a pointer. Using indirection via an ID you manage yourself is a pessimation, not an optimization: it requires more code, slower speed, and more memory too (because the integer IDs themselves are pointers to 12 byte objects, not 4 byte ints). If you *need* indirection, say because you are keeping the data in a database and you want to only lazily load it when needed, rather than all at once, then the right approach is probably a proxy object: class PartProxy(object): def __init__(self, database_id): self._info = None self.database_id = database_id @property def info(self): if self._info is None: self._info = get_from_database(self.database_id) return self._info -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor