Re: [Tutor] subtyping builtin type

spir Wed, 01 Jan 2014 05:50:46 -0800

On 01/01/2014 01:26 AM, Steven D'Aprano wrote:

On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:

Hello,


I don't remember exactly how to do that. As an example:

class Source (str):
     __slots__ = ['i', 'n']
     def __init__ (self, string):
         self.i = 0                  # current matching index in source
         self.n = len(string)        # number of ucodes (Unicode code points)
         #~ str.__init__(self, string)


The easiest way to do that is:

class Source(str):
     def __init__(self, *args, **kwargs):
         self.i = 0
         self.n = len(self)


Thank you Steven for your help.

Well, I don't really get everything you say below, about possible alternatives,so I'll give a bit more details. The only point of Source is to have a stringstoring current index, somewhat like file (being read) on the filesystem. I takethe opportunity to add a few features, but would do without Source altogether ifit were not for 'i'.The reason is: it is for parsing library, or hand-made parsers. Every matchingfunc, representing a pattern (or "rule"), advances in source whenever mathc isok, right? Thus in addition to return the form (of what was matched), they mustreturn the new match index:

        return (form, i)

Symmetrically, every match func using another (meaning nearly all) receive thispair. (Less annoyingly, every math func also takes i as input, in addition tothe src str.) (There are also a handful of other annoying points, consequencesof those ones.)

If I have a string that stores its index, all of this mess is gone. It makes forclean and simple interfaces everywhere. Also (one of the consequences) I candirectly provide match funcs to the user, instead of having to wrap them insidea func which only utility is to hide the additional index (in both input & output).

As a (premature) memory optimization, you can use __slots__ to reduce
the amount of memory per instance.

You're right! (I did it in fact for 'Form' subtypes, representing match resultswhich are constantly instanciated, possibly millions of times in a single parse;but on the way i did it to Source as well, which is stupid ;-)

But this (probably) is the wrong way
to solve this problem. Your design makes Source a kind of string:

issubclass(Source, str)
=> True

I expect that it should not be. (Obviously I'm making some assumptions
about the design here.)

Actually, doesn't matter whether issubclass or isinstance are true. But it mustbe a subtype to use string methods (including magic ones like slicing), as yousay below.

 To decide whether you should use subclassing
here, ask yourself a few questions:

* Does it make sense to call string methods on Source objects? In
   Python 3.3, there are over 40 public string methods. If *just one*
   of them makes no sense for a Source object, then Source should not
   be a subclass of str.
   e.g. source.isnumeric(), source.isidentifier()

Do you really mean "If *just one* of them makes no sense for a Source object,then Source should not be a subclass of str." ? Or should I understand "If *onlyone* of them does make sense for a Source object, then Source should not be asubclass of str." ?

Also, why? or rather why not make it a subtyp if I only use one method?

Actually, a handful of them are intensely used (indexing, slicing, the series ofis* [eg isalnum], a few more as the prject moves on). This is far enough for meto make it a subtype.Also, it fits semantically (conceptualy): a src is a str, that just happens tostore a current index.

* Do you expect to pass Source objects to arbitrary functions which
   expect strings, and have the result be meaningful?


No, apart from string methods themselves. It's all internal to the lib.

* Does it make sense for Source methods to return plain strings?
   source.upper() returns a str, not a Source object.

Doesn't matter (it's parsing). The result Forms, when they hold snippets, holdplain strings, not Source's, thus all is fine.

* Is a Source thing a kind of string? If so, what's the difference
   between a Source and a str? Why not just use a str?


see above

   If all you want is to decorate a string with a couple of extra
   pieces of information, then a limitation of Python is that you
   can only do so by subclassing.

That's it. But I don't know of any other solution in other langs, apart fromcomposition, which in my view is clearly inferior:

* it does not fit semantics (conception)
* it's annoying syntactically (constant attribute access)

* Or does a Source thing *include* a string as a component part of
   it? If that is the case -- and I think it is -- then composition
   is the right approach.

No, a source is conceptually like a string, not a kind of composite object witha string among other fields. (Again, think at a file.)

The difference between has-a and is-a relationships are critical. I
expect that the right relationship should be:

     a Source object has a string

rather than "is a string". That makes composition a better design than
inheritance. Here's a lightweight mutable solution, where all three
attributes are public and free to be varied after initialisation:


No, see above.

class Source:
     def __init__(self, string, i=0, n=None):
         if n is None:
             n = len(string)
         self.i = i
         self.n = n
         self.string = string


Wrong solution for my case.

An immutable solution is nearly as easy:

from collections import namedtuple

class Source(namedtuple("Source", "string i n")):
     def __new__(cls, string, i=0, n=None):
         if n is None:
             n = len(string)
         return super(Source, cls).__new__(cls, string, i, n)

An immutable version is fine. But what does this version bring me? a Source'scode-string is immutable already. 'i' does change.

Here's a version which makes the string attribute immutable, and the i
and n attributes mutable:

class Source:
     def __init__(self, string, i=0, n=None):
         if n is None:
             n = len(string)
         self.i = i
         self.n = n
         self._string = string
     @property
     def string(self):
         return self._string

Again, what is here better than a plain subtyping of type 'str'? (And I dislikethe principle of properties; i want to know whether it's a func call or plainattr access, on the user side. Bertrand Meyer's "uniform access principle" forEiffel is what I dislike most in this lang ;-) [which has otherwise much to offer].)


Seems I have more to learn ;-) great!

Side-note: after reflexion, I guess I'll get rid of 'n'. 'n' is used each time Ineed in match funcs to check for end-of-source (meaning, in every low-level,lexical pattern, the ones that actually "eat" portions of source). I defined 'n'to have it at hand, but now I wonder whether it's not in fact less efficientthan just writing len(src) instead of src.n, everywhere. (Since indeed pythonstrings hold their length: it's certainly not an actual func call! Python lies ;-)


Denis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] subtyping builtin type

Reply via email to