>>>> raw = unicode("125° 15' 5.55''", 'utf-8') >>> Again, I think this can be simplified to >>> raw = u"125° 15' 5.55''" >> It does, but it's getting confusing when I compare the following: >> >>> raw = u"125° 15' 5.55''" >> 125° 15' 5.55'' > > Where does that output come from?
sorry, my bad: over-hastily copy of non-existant output. >> >>> print u"125° 15' 5.55''" >> UnicodeEncodeError: 'ascii' codec can't encode characters in >> position 3-4: ordinal not in range(128) > > print must encode unicode strings. It tries to encode them using > the default encoding which doesnt' work because the source is not > ascii. >> >>> print u"125° 15' 5.55''".encode('utf-8') >> 125° 15' 5.55'' > > That is the way to get it to work. > >> >>> print unicode("125° 15' 5.55''") >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in >> position 3: ordinal not in range(128) > > Here the problem is trying to create the unicode string using the > default encoding, again it doesn't work because the source contains > non-ascii characters. > >> >>> print unicode("125° 15' 5.55''", 'utf-8') >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' >> in position 3: ordinal not in range(128) > > This is the same as the first encode error. This is the thing I don't get; or only partly: I'm sending a utf-8 encoded string to print. print apparently ignores that, and still tries to print things using ascii encoding. If I'm correct in that assessment, then why would print ignore that? >> So apart from the errors all being slightly different, is there >> perhaps some difference between the str() and repr() functions >> (looks like repr uses escape backslashes)? > > Right. > >> And checking the default encoding inside the python cmdline, I >> see that my sys module doesn't actually have a setdefaultencoding >> () method; was that something that should have been properly >> configured at compile time? The documentation mentions something >> about the site module, but I can't find it there either. > > The setdefaultencoding() function (it's not a method, it is a > module-level function) yes, sorry, got my terminology wrong there. > is removed from the sys module as part of startup (I think by the > site module). That is why you have to call it from > sitecustomize.py. You can also > reload(sys) > to restore it but it's better to write your app so it doesn't > require the default encoding to be changed. Ie, use encode('utf-8') where necessary? But I did see some examples pass by using import sys sys.setdefaultencoding('utf-8') ?? Oh well, in general I tend to play long enough with things like this that 1) I get it (script) working, and 2) I have a decent feeling (90%) that I actually understand what is going on, and why other things failed. Which is roughly where I am now ;-). Evert _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor