On Tue, Feb 03, 2015 at 10:12:09PM +0100, Jugurtha Hadjar wrote: > Hello, > > I was writing something and thought: Since the class had some > 'constants', and multiple instances would be created, I assume that each > instance would have its own data. So this would mean duplication of the > same constants?
Not necessarily. Consider: class A(object): spam = 23 def __init__(self): self.eggs = 42 In this case, the "spam" attribute is on the class, not the instance, and so it doesn't matter how many A instances you have, there is only one reference to 23 and a single copy of 23. The "eggs" attribute is on the instance. That means that each instance has its own separate reference to 42. Does that mean a separate copy of 42? Maybe, maybe not. In general, yes: if eggs was a mutable object like a list, or a dict, say: self.eggs = [] then naturally it would need to be a separate list for each instance. (If you wanted a single list shared between all instances, put it on the class.) But with immutable objects like ints, strings and floats, there is an optimization available to the Python compiler: it could reuse the same object. There would be a separate reference to that object per instance, but only one copy of the object itself. Think of references as being rather like C pointers. References are cheap, while objects themselves could be arbitrarily large. With current versions of Python, the compiler will intern and re-use small integers and strings which look like identifiers ("alpha" is an identifier, "hello world!" is not). But that is subject to change: it is not a language promise, it is an implementation optimization. However, starting with (I think) Python 3.4 or 3.5, Python will optimize even more! Instances will share dictionaries, which will save even more memory. Each instance has a dict, which points to a hash table of (key, value) records: <instance a of A> __dict__ ----> [ UNUSED UNUSED (ptr to key, ptr to value) UNUSED ... ] <instance b of A> __dict__ ----> [ UNUSED UNUSED (ptr to key, ptr to value) UNUSED ... ] For most classes, the instances a and b will have the same set of keys, even though the values will be different. That means the pointers to keys are all the same. So the new implementation of dict will optimize that case to save memory and speed up dictionary access. > If so, I thought why not put the constants in memory > once, for every instance to access (to reduce memory usage). > > Correct me if I'm wrong in my assumptions (i.e: If instances share stuff). In general, Python will share stuff if it can, although maybe not *everything* it can. > So I investigated further.. > > >>> import sys > >>> sys.getsizeof(5) > 12 > > > So an integer on my machine is 12 bytes. A *small* integer is 12 bytes. A large integer can be more: py> sys.getsizeof(2**100) 26 py> sys.getsizeof(2**10000) 1346 py> sys.getsizeof(2**10000000) 1333346 > Now: > > >>> class foo(object): > ... def __init__(self): > ... pass > > >>> sys.getsizeof(foo) > 448 > > >>> sys.getsizeof(foo()) > 28 > > >>> foo > <class '__main__.foo'> > >>> foo() > <__main__.foo object at 0xXXXXXXX The *class* Foo is a fairly large object. It has space for a name, a dictionary of methods and attributes, a tuple of base classes, a table of weak references, a docstring, and more: py> class Foo(object): ... pass ... py> dir(Foo) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] py> vars(Foo) mappingproxy({'__qualname__': 'Foo', '__module__': '__main__', '__doc__': None, '__weakref__': <attribute '__weakref__' of 'Foo' objects>, '__dict__': <attribute '__dict__' of 'Foo' objects>}) py> Foo.__base__ <class 'object'> py> Foo.__bases__ (<class 'object'>,) The instance may be quite small, but of course that depends on how many attributes it has. Typically, all the methods live in the class, and are shared, while data attributes are per-instance. > - Second weird thing: > > >>> class bar(object): > ... def __init__(self): > ... self.w = 5 > ... self.x = 6 > ... self.y = 7 > ... self.z = 8 > > >>> sys.getsizeof(bar) > 448 > >>> sys.getsizeof(foo) > 448 Nothing weird here. Both your Foo and Bar classes contain the same attributes. The only difference is that Foo.__init__ method does nothing, while Bar.__init__ has some code in it. If you call sys.getsizeof(foo.__init__.__code__) and compare it to the same for bar, you should see a difference. > >>> sys.getsizeof(bar()) > 28 > >>> sys.getsizeof(foo()) > 28 In this case, the Foo and Bar instances both have the same size. They both have a __dict__, and the Foo instance's __dict__ is empty, while the Bar instance's __dict__ has 4 items. Print: print(foo().__dict__) print(bar().__dict__) to see the difference. But with only 4 items, Bar's items will fit in the default sized hash table. No resize will be triggered and the sizes are the same. Run this little snippet of code to see what happens: d = {} for c in "abcdefghijklm": print(len(d), sys.getsizeof(d)) d[c] = None > Summary questions: > > 1 - Why are foo's and bar's class sizes the same? (foo's just a nop) Foo is a class, it certainly isn't a NOP. Just because you haven't given it state or behaviour doesn't mean it doesn't have any. It has the default state and behaviour that all classes start off with. > 2 - Why are foo() and bar() the same size, even with bar()'s 4 integers? Because hash tables (dicts) contain empty slots. Once the hash table reaches 50% full, a resize is triggered. > 3 - Why's bar()'s size smaller than the sum of the sizes of 4 integers? Because sys.getsizeof tells you the size of the object, not the objects referred to by the object. Here is a recipe for a recursive getsizeof: http://code.activestate.com/recipes/577504 -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor