Re: [Tutor] Why is an instance smaller than the sum of its components?

Jugurtha Hadjar Tue, 03 Feb 2015 17:59:58 -0800

On 02/04/2015 12:18 AM, Steven D'Aprano wrote:


Not necessarily. Consider:

class A(object):
     spam = 23
     def __init__(self):
         self.eggs = 42

In this case, the "spam" attribute is on the class, not the instance,
and so it doesn't matter how many A instances you have, there is only
one reference to 23 and a single copy of 23.

The "eggs" attribute is on the instance. That means that each instance
has its own separate reference to 42.



Hmm.. Here are the first few lines of my class:

class Sender(object):
        """
                Redacted
        """

        SENDER_DB = 'sender.db'

        def __init__(self, phone, balance=0.0):
                self.phone = phone
                self.balance = balance

I gave the (bad) examples that way because I thought what mattered ishow much data was inside. I put SENDER_DB there because it made sense toput constants way on top, not because I had any idea it'd make thedifference you mentioned (class attributes vs instance attributes).

And also because it's a common piece of data to all themethods...(because after I started with each method opening and closingthe database,I eliminated the code and made a method that returns a connection and acursor, and the others just call it when they need to do stuff on thedatabase. I'll ask another question later on how to refine it)

But now that you, Dave, and Peter pointed this out, I'm thinking ofputting the methods' constants up there (mainly patterns for regularexpressions, and queries (SQL)).

Does that mean a separate copy of 42? Maybe, maybe not. In general, yes:
if eggs was a mutable object like a list, or a dict, say:

         self.eggs = []

then naturally it would need to be a separate list for each instance.
(If you wanted a single list shared between all instances, put it on the
class.) But with immutable objects like ints, strings and floats, there
is an optimization available to the Python compiler: it could reuse the
same object. There would be a separate reference to that object per
instance, but only one copy of the object itself.

Okay.. I think that even if Python does optimize that, this belongs tothe "good practice" category, so it's better that I'm the one who doesit instead of relying on what the compiler might do. I'm abeginner(that's the first thing I write that does something useful) andwould like to reinforce good habits.

Think of references as being rather like C pointers. References are
cheap, while objects themselves could be arbitrarily large.

That's the analogy I made, but I'm careful with those. I don't want toend up like the "English As She Is Spoke" book..

With current versions of Python, the compiler will intern and re-use
small integers and strings which look like identifiers ("alpha" is an
identifier, "hello world!" is not).

...

In general, Python will share stuff if it can, although maybe not
*everything* it can.


That's interesting. I'll try to read up on this without being sidetracked.

In this case, the Foo and Bar instances both have the same size. They
both have a __dict__, and the Foo instance's __dict__ is empty, while
the Bar instance's __dict__ has 4 items. Print:

print(foo().__dict__)
print(bar().__dict__)

to see the difference. But with only 4 items, Bar's items will fit in
the default sized hash table. No resize will be triggered and the sizes
are the same.

I thought that there was a default size allocated even for an "empty"class (which is correct), and then if I added w, x, y, z, their sizewould be *added* to the default size (which is incorrect)..

Somehow, I didn't think of the analogy of 8dec being (1000b) (4 bits)and incrementing, it's still 4 bits through 15dec (1111b).

So that's: default class size + data = default class size until it"overflows". (or until 50% of default class size is reached as youmentioned later).

Run this little snippet of code to see what happens:
d = {}
for c in "abcdefghijklm":
     print(len(d), sys.getsizeof(d))
     d[c] = None


For memo:

(0, 136)
(1, 136)
(2, 136)
(3, 136)
(4, 136)
(5, 136)
(6, 520)
(7, 520)
(8, 520)
(9, 520)
(10, 520)
(11, 520)
(12, 520)

Summary questions:

1 - Why are foo's and bar's class sizes the same? (foo's just a nop)


Foo is a class, it certainly isn't a NOP. Just because you haven't given
it state or behaviour doesn't mean it doesn't have any. It has the
default state and behaviour that all classes start off with.

2 - Why are foo() and bar() the same size, even with bar()'s 4 integers?


Because hash tables (dicts) contain empty slots. Once the hash table
reaches 50% full, a resize is triggered.

3 - Why's bar()'s size smaller than the sum of the sizes of 4 integers?


Because sys.getsizeof tells you the size of the object, not the objects
referred to by the object. Here is a recipe for a recursive getsizeof:

http://code.activestate.com/recipes/577504

This is cool. Thanks a lot (and Dave, too) for the great explanations..I'll post some code about the database stuff in a new thread.


--
~Jugurtha Hadjar,
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Why is an instance smaller than the sum of its components?

Reply via email to