On 01/01/2014 01:26 AM, Steven D'Aprano wrote:
On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
Hello,
I don't remember exactly how to do that. As an example:
class Source (str):
__slots__ = ['i', 'n']
def __init__ (self, string):
self.i = 0 # current matching index in source
self.n = len(string) # number of ucodes (Unicode code points)
#~ str.__init__(self, string)
The easiest way to do that is:
class Source(str):
def __init__(self, *args, **kwargs):
self.i = 0
self.n = len(self)
Thank you Steven for your help.
Well, I don't really get everything you say below, about possible alternatives,
so I'll give a bit more details. The only point of Source is to have a string
storing current index, somewhat like file (being read) on the filesystem. I take
the opportunity to add a few features, but would do without Source altogether if
it were not for 'i'.
The reason is: it is for parsing library, or hand-made parsers. Every matching
func, representing a pattern (or "rule"), advances in source whenever mathc is
ok, right? Thus in addition to return the form (of what was matched), they must
return the new match index:
return (form, i)
Symmetrically, every match func using another (meaning nearly all) receive this
pair. (Less annoyingly, every math func also takes i as input, in addition to
the src str.) (There are also a handful of other annoying points, consequences
of those ones.)
If I have a string that stores its index, all of this mess is gone. It makes for
clean and simple interfaces everywhere. Also (one of the consequences) I can
directly provide match funcs to the user, instead of having to wrap them inside
a func which only utility is to hide the additional index (in both input & output).
As a (premature) memory optimization, you can use __slots__ to reduce
the amount of memory per instance.
You're right! (I did it in fact for 'Form' subtypes, representing match results
which are constantly instanciated, possibly millions of times in a single parse;
but on the way i did it to Source as well, which is stupid ;-)
But this (probably) is the wrong way
to solve this problem. Your design makes Source a kind of string:
issubclass(Source, str)
=> True
I expect that it should not be. (Obviously I'm making some assumptions
about the design here.)
Actually, doesn't matter whether issubclass or isinstance are true. But it must
be a subtype to use string methods (including magic ones like slicing), as you
say below.
To decide whether you should use subclassing
here, ask yourself a few questions:
* Does it make sense to call string methods on Source objects? In
Python 3.3, there are over 40 public string methods. If *just one*
of them makes no sense for a Source object, then Source should not
be a subclass of str.
e.g. source.isnumeric(), source.isidentifier()
Do you really mean "If *just one* of them makes no sense for a Source object,
then Source should not be a subclass of str." ? Or should I understand "If *only
one* of them does make sense for a Source object, then Source should not be a
subclass of str." ?
Also, why? or rather why not make it a subtyp if I only use one method?
Actually, a handful of them are intensely used (indexing, slicing, the series of
is* [eg isalnum], a few more as the prject moves on). This is far enough for me
to make it a subtype.
Also, it fits semantically (conceptualy): a src is a str, that just happens to
store a current index.
* Do you expect to pass Source objects to arbitrary functions which
expect strings, and have the result be meaningful?
No, apart from string methods themselves. It's all internal to the lib.
* Does it make sense for Source methods to return plain strings?
source.upper() returns a str, not a Source object.
Doesn't matter (it's parsing). The result Forms, when they hold snippets, hold
plain strings, not Source's, thus all is fine.
* Is a Source thing a kind of string? If so, what's the difference
between a Source and a str? Why not just use a str?
see above
If all you want is to decorate a string with a couple of extra
pieces of information, then a limitation of Python is that you
can only do so by subclassing.
That's it. But I don't know of any other solution in other langs, apart from
composition, which in my view is clearly inferior:
* it does not fit semantics (conception)
* it's annoying syntactically (constant attribute access)
* Or does a Source thing *include* a string as a component part of
it? If that is the case -- and I think it is -- then composition
is the right approach.
No, a source is conceptually like a string, not a kind of composite object with
a string among other fields. (Again, think at a file.)
The difference between has-a and is-a relationships are critical. I
expect that the right relationship should be:
a Source object has a string
rather than "is a string". That makes composition a better design than
inheritance. Here's a lightweight mutable solution, where all three
attributes are public and free to be varied after initialisation:
No, see above.
class Source:
def __init__(self, string, i=0, n=None):
if n is None:
n = len(string)
self.i = i
self.n = n
self.string = string
Wrong solution for my case.
An immutable solution is nearly as easy:
from collections import namedtuple
class Source(namedtuple("Source", "string i n")):
def __new__(cls, string, i=0, n=None):
if n is None:
n = len(string)
return super(Source, cls).__new__(cls, string, i, n)
An immutable version is fine. But what does this version bring me? a Source's
code-string is immutable already. 'i' does change.
Here's a version which makes the string attribute immutable, and the i
and n attributes mutable:
class Source:
def __init__(self, string, i=0, n=None):
if n is None:
n = len(string)
self.i = i
self.n = n
self._string = string
@property
def string(self):
return self._string
Again, what is here better than a plain subtyping of type 'str'? (And I dislike
the principle of properties; i want to know whether it's a func call or plain
attr access, on the user side. Bertrand Meyer's "uniform access principle" for
Eiffel is what I dislike most in this lang ;-) [which has otherwise much to offer].)
Seems I have more to learn ;-) great!
Side-note: after reflexion, I guess I'll get rid of 'n'. 'n' is used each time I
need in match funcs to check for end-of-source (meaning, in every low-level,
lexical pattern, the ones that actually "eat" portions of source). I defined 'n'
to have it at hand, but now I wonder whether it's not in fact less efficient
than just writing len(src) instead of src.n, everywhere. (Since indeed python
strings hold their length: it's certainly not an actual func call! Python lies ;-)
Denis
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor