On 01/01/2014 01:26 AM, Steven D'Aprano wrote:
On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
Hello,

I don't remember exactly how to do that. As an example:

class Source (str):
     __slots__ = ['i', 'n']
     def __init__ (self, string):
         self.i = 0                  # current matching index in source
         self.n = len(string)        # number of ucodes (Unicode code points)
         #~ str.__init__(self, string)

The easiest way to do that is:

class Source(str):
     def __init__(self, *args, **kwargs):
         self.i = 0
         self.n = len(self)

Thank you Steven for your help.

Well, I don't really get everything you say below, about possible alternatives, so I'll give a bit more details. The only point of Source is to have a string storing current index, somewhat like file (being read) on the filesystem. I take the opportunity to add a few features, but would do without Source altogether if it were not for 'i'. The reason is: it is for parsing library, or hand-made parsers. Every matching func, representing a pattern (or "rule"), advances in source whenever mathc is ok, right? Thus in addition to return the form (of what was matched), they must return the new match index:
        return (form, i)
Symmetrically, every match func using another (meaning nearly all) receive this pair. (Less annoyingly, every math func also takes i as input, in addition to the src str.) (There are also a handful of other annoying points, consequences of those ones.)

If I have a string that stores its index, all of this mess is gone. It makes for clean and simple interfaces everywhere. Also (one of the consequences) I can directly provide match funcs to the user, instead of having to wrap them inside a func which only utility is to hide the additional index (in both input & output).

As a (premature) memory optimization, you can use __slots__ to reduce
the amount of memory per instance.

You're right! (I did it in fact for 'Form' subtypes, representing match results which are constantly instanciated, possibly millions of times in a single parse; but on the way i did it to Source as well, which is stupid ;-)

But this (probably) is the wrong way
to solve this problem. Your design makes Source a kind of string:

issubclass(Source, str)
=> True

I expect that it should not be. (Obviously I'm making some assumptions
about the design here.)

Actually, doesn't matter whether issubclass or isinstance are true. But it must be a subtype to use string methods (including magic ones like slicing), as you say below.

 To decide whether you should use subclassing
here, ask yourself a few questions:

* Does it make sense to call string methods on Source objects? In
   Python 3.3, there are over 40 public string methods. If *just one*
   of them makes no sense for a Source object, then Source should not
   be a subclass of str.
   e.g. source.isnumeric(), source.isidentifier()

Do you really mean "If *just one* of them makes no sense for a Source object, then Source should not be a subclass of str." ? Or should I understand "If *only one* of them does make sense for a Source object, then Source should not be a subclass of str." ?
Also, why? or rather why not make it a subtyp if I only use one method?

Actually, a handful of them are intensely used (indexing, slicing, the series of is* [eg isalnum], a few more as the prject moves on). This is far enough for me to make it a subtype. Also, it fits semantically (conceptualy): a src is a str, that just happens to store a current index.

* Do you expect to pass Source objects to arbitrary functions which
   expect strings, and have the result be meaningful?

No, apart from string methods themselves. It's all internal to the lib.

* Does it make sense for Source methods to return plain strings?
   source.upper() returns a str, not a Source object.

Doesn't matter (it's parsing). The result Forms, when they hold snippets, hold plain strings, not Source's, thus all is fine.

* Is a Source thing a kind of string? If so, what's the difference
   between a Source and a str? Why not just use a str?

see above

   If all you want is to decorate a string with a couple of extra
   pieces of information, then a limitation of Python is that you
   can only do so by subclassing.

That's it. But I don't know of any other solution in other langs, apart from composition, which in my view is clearly inferior:
* it does not fit semantics (conception)
* it's annoying syntactically (constant attribute access)

* Or does a Source thing *include* a string as a component part of
   it? If that is the case -- and I think it is -- then composition
   is the right approach.

No, a source is conceptually like a string, not a kind of composite object with a string among other fields. (Again, think at a file.)

The difference between has-a and is-a relationships are critical. I
expect that the right relationship should be:

     a Source object has a string

rather than "is a string". That makes composition a better design than
inheritance. Here's a lightweight mutable solution, where all three
attributes are public and free to be varied after initialisation:

No, see above.

class Source:
     def __init__(self, string, i=0, n=None):
         if n is None:
             n = len(string)
         self.i = i
         self.n = n
         self.string = string

Wrong solution for my case.

An immutable solution is nearly as easy:

from collections import namedtuple

class Source(namedtuple("Source", "string i n")):
     def __new__(cls, string, i=0, n=None):
         if n is None:
             n = len(string)
         return super(Source, cls).__new__(cls, string, i, n)

An immutable version is fine. But what does this version bring me? a Source's code-string is immutable already. 'i' does change.

Here's a version which makes the string attribute immutable, and the i
and n attributes mutable:

class Source:
     def __init__(self, string, i=0, n=None):
         if n is None:
             n = len(string)
         self.i = i
         self.n = n
         self._string = string
     @property
     def string(self):
         return self._string

Again, what is here better than a plain subtyping of type 'str'? (And I dislike the principle of properties; i want to know whether it's a func call or plain attr access, on the user side. Bertrand Meyer's "uniform access principle" for Eiffel is what I dislike most in this lang ;-) [which has otherwise much to offer].)

Seems I have more to learn ;-) great!

Side-note: after reflexion, I guess I'll get rid of 'n'. 'n' is used each time I need in match funcs to check for end-of-source (meaning, in every low-level, lexical pattern, the ones that actually "eat" portions of source). I defined 'n' to have it at hand, but now I wonder whether it's not in fact less efficient than just writing len(src) instead of src.n, everywhere. (Since indeed python strings hold their length: it's certainly not an actual func call! Python lies ;-)

Denis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to