Hello all, I am working under the umbrella of the Python Software Foundation for the Google Summer of Code and am keeping a blog about the work. Part of my work is refactoring and extending some existing code. This code makes use of Python's super, and so I am trying to understand the ins and outs of super. I recently summarized my understanding of super on my blog, and I was wondering if anyone would care to comment on my write up. I am particularly worried about my handling of my multiple inheritance example and explanation of class F.
I apologize for the length and if this is an inappropriate request for this list, but I read the discussions here almost daily, and often the points that come up add to my Python understanding very much. Everyone seems to have a great grasp of Python and programming and that rare ability (and willingness) to *explain* topics. The post is here <http://scipystats.blogspot.com/2009/06/design-issues-understanding-pythons.html> But if it's easier for the ML, comments can be inlined below. The In/Out prompts are the iPython interpreter if anyone is wondering, and I tried to preserve indents. Cheers, Skipper -------------------------------------------------------------------------------------------------------------- The current statistical models package is housed in the NiPy, Neuroimaging in Python, project. Right now, it is designed to rely on Python's built-in super to handle class inheritance. This post will dig a little more into the super function and what it means for the design of the project and future extensions. Note that there are plenty of good places to learn about super and that this post is to help me as much as anyone else. You can find the documentation for super here. If this is a bit confusing, it will, I hope, become clearer after I demonstrate the usage. First, let's take a look at how super actually works for the simple case of single inheritance (right now, we are not planning on using multiple inheritance in the project) and an __init__ chain (note that super can call any of its parent class's methods, but using __init__ is my current use case). The following examples were adapted from some code provided by mentors (thank you!). class A(object): def __init__(self, a): self.a = a print 'executing A().__init__' class B(A): def __init__(self, a): self.ab = a*2 print 'executing B().__init__' super(B,self).__init__(a) class C(B): def __init__(self, a): self.ac = a*3 print 'executing C().__init__' super(C,self).__init__(a) Now let's have a look at creating an instance of C. In [2]: cinst = C(10) executing C().__init__ executing B().__init__ executing A().__init__ In [3]: vars(cinst) Out[3]: {'a': 10, 'ab': 20, 'ac': 30} That seems simple enough. Creating an instance of C with a = 10 will also give C the attributes of B(10) and A(10). This means our one instance of C has three attributes: cinst.ac, cinst.ab, cinst.a. The latter two were created by its parent classes (or superclasses) __init__ method. Note that A is also a new-style class. It subclasses the 'object' type. The actual calls to super pass the generic class 'C' and a handle to that class 'self', which is 'cinst' in our case. Super returns the literal parent of the class instance C since we passed it 'self'. It should be noted that A and B were created when we initialized cinst and are, therefore, 'bound' class objects (bound to cinst in memory through the actual instance of class C) and not referring to the class A and class B instructions defined at the interpreter (assuming you are typing along at the interpreter). Okay now let's define a few more classes to look briefly at multiple inheritance. class D(A): def __init__(self, a): self.ad = a*4 print 'executing D().__init__' # if super is commented out then __init__ chain ends #super(D,self).__init__(a) class E(D): def __init__(self, a): self.ae = a*5 print 'executing E().__init__' super(E,self).__init__(a) Note that the call to super in D is commented out. This breaks the __init__ chain. In [4]: einst = E(10) executing E().__init__ executing D().__init__ In [5]: vars(einst) Out[5]: {'ad': 40, 'ae': 50} If we uncomment the super in D, we get as we would expect In [6]: einst = E(10) executing E().__init__ executing D().__init__ executing A().__init__ In [7]: vars(einst) Out[7]: {'a': 10, 'ad': 40, 'ae': 50} Ok that's pretty straightforward. In this way super is used to pass off something to its parent class. For instance, say we have a little more realistic example and the instance of C takes some timeseries data that exhibits serial correlation. Then we can have C correct for the covariance structure of the data and "pass it up" to B where B can then perform OLS on our data now that it meets the assumptions of OLS. Further B can pass this data to A and return some descriptive statistics for our data. But remember these are 'bound' class objects, so they're all attributes to our original instance of C. Neat huh? Okay, now let's look at a pretty simple example of multiple inheritance. class F(C,E): def __init__(self, a): self.af = a*6 print 'executing F().__init__' super(F,self).__init__(a) For this example we are using the class of D, that has super commented out. In [8]: finst = F(10) executing F().__init__ executing C().__init__ executing B().__init__ executing E().__init__ executing D().__init__ In [8]: vars(finst) Out[8]: {'ab': 20, 'ac': 30, 'ad': 40, 'ae': 50, 'af': 60} The first time I saw this gave me pause. Why isn't there an finst.a? I was expecting the MRO to go F -> C -> B -> A -> E -> D -> A. Let's take a closer look. The F class has multiple inheritance. It inherits from both C and E. We can see F's method resolution order by doing In [9]: F.__mro__ Out[9]: (<class '__main__.F'>, <class '__main__.C'>, <class '__main__.B'>, <class '__main__.E'>, <class '__main__.D'>, <class '__main__.A'>, <type 'object'>) Okay, so we can see that for F A is a subclass of D but not B. But why? In [10]: A.__subclasses__() Out[10]: [<class '__main__.B'>, <class '__main__.D'>] The reason is that A does not have a call to super, so the chain doesn't exist here. When you instantiate F, the hierarchy goes F -> C -> B -> E -> D -> A. The reason that it goes from B -> E is because A does not have a call to super, so it can't pass anything to E (It couldn't pass anything to E because the object.__init__ doesn't take a parameter "a" and because you cannot have a MRO F -> C -> B -> A -> E -> D -> A as this is inconsistent and will give an error!), so A does not cause a problem and the chain ends after D (remember that D's super is commented out, but if it were not then there would be finst.a = 10 as expected). Whew. I'm sure you're thinking "Oh that's (relatively) easy. I'm ready to go crazy with super." But there are a number of things must keep in mind when using super, which makes it necessary for the users of super to proceed carefully. 1. super() only works with new-style classes. You can read more about classic/old-style vs new-style classes here. From there you can click through or just go here for more information on new-style classes. Therefore, you must know that the base classes are new-style. This isn't a problem for our project right now, because I have access to all of the base classes. 2. Subclasses must use super if their superclasses do. This is why the user of super must be well-documented. If we have to classes A and B that both use super and a class C that inherits from them, but does not know about super then we will have a problem. Consider the slightly different case class A(object): def __init__(self): print "executing A().__init__" super(A, self).__init__() class B(object): def __init__(self): print "executing B().__init__" super(B, self).__init__() class C(A,B): def __init__(self): print "executing C().__init__" A.__init__(self) B.__init__(self) # super(C, self).__init__() Say class C was defined by someone who couldn't see class A and B, then they wouldn't know about super. Now if we do In [11]: C.__mro__ Out[11]: (<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <type 'object'>) In [12]: c = C() executing C().__init__ executing A().__init__ executing B().__init__ executing B().__init__ B got called twice, but by now this should be expected. A's super calls __init__ on the next object in the MRO which is B (it works this time unlike above because there is no parameter passed with __init__), then C explicitly calls B again. If we uncomment super and comment out the calls to the parent __init__ methods in C then this works as expected. 3. Superclasses probably should use super if their subclasses do. We saw this earlier with class D's super call commented out. Note also that A does not have a call to super. The last class in the MRO does not need super *if* there is only one such class at the end. 4. Classes must have the exact same call signature. This should be obvious but is important for people to be able to subclass. It is possible however for subclasses to add additional arguments so *args and **kwargs should be probably always be included in the methods that are accessible to subclasses. 5. Because of these last three points, the use of super must be explicitly documented, as it has become a part of the interface to our classes. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor