[back to the list after a rather long break]
Hello,
I stepped on a unicode issue ;-) (one more)
Below an illustration:
===============================
class U(unicode):
def __str__(self):
return self
# if you can't properly see the string below,
# 128<ordinals<255
c0 = "¶ÿµ"
c1 = U("¶ÿµ","utf8")
c2 = unicode("¶ÿµ","utf8")
for c in (c0,c1,c2):
try:
print "%s" %c,
except UnicodeEncodeError:
print "***",
try:
print c.__str__(),
except UnicodeEncodeError:
print "***",
try:
print str(c)
except UnicodeEncodeError:
print "***"
==>
¶ÿµ ¶ÿµ ¶ÿµ
¶ÿµ ¶ÿµ ***
¶ÿµ *** ***
================================
The last line shows that a regular unicode cannot be passed to str() (more or
less ok) nor __str__() (not ok at all).
Maybe I overlook some obvious point (again). If not, then this means 2 issues
in fact:
-1- The old ambiguity of str() meaning both "create an instance of type str
from the given data" and "build a textual representation of the given object,
through __str__", which has always been a semantic flaw for me, becomes
concretely problematic when we have text that is not str.
Well, i'm very surprised of this. Actually, how comes this point doesn't seem
to be very well known; how is it simply possible to use unicode without
stepping on this problem? I guess this breaks years or even decades of habits
for coders used to write str() when they mean __str__().
-2- How is it possible that __str__ does not work on a unicode object? It seems
that the method is simply not implemented on unicode, the type, and __repr__
neither. So that it falls back to str().
Strangely enough, % interpolation works, which means that for both types of
text a short circuit is used, namely return the text itself as is. I would have
bet my last cents that % would simply delegate to __str__, or maybe that they
were the same func in fact, synonyms, but obviously I was wrong!
Looking for workarounds, I first tried to overload (or rather create) __str__
like in the U type above. But this solution is far to be ideal cause we still
cannot use str() (I mean my digits can write it while my head is
who-knows-where). Also, it is really unusable in fact for the following reason:
===================================
print c1.__class__
print c1[1].__class__
c3 = c1 ; print (c1+c3).__class__
==>
<class '__main__.U'>
<type 'unicode'>
<type 'unicode'>
====================================
Any operation will return back a unicode instead of the original type. So that
the said type would have to overload all possible operations on text, which is
much, indeed, to convert back the results. I don't even speak of performance
issues.
So, the only solution seems to me to use % everywhere, hunt all str and __str__
and __repr__ and such in all code.
I hope I'm wrong on this. Please, give me a better solution ;-)
------
la vita e estrany
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor