Timmie wrote: > I am totally lost: > * python has ascii as default encoding > * my linux uses UTF-8 (therefore all files created on linux are UTF-8) > * windows uses cp1250 > * IPtyhon something else: on the machine where I am currently on stdin is set > to > cp850 > > So what encoding to I use to display and process characters that exeeed the > standard english alphabet?
> My initial question was: > > 1) get a coordinate (DEG° MIN' SEC'') as input from user via easygui > 2) split that string into its subscripts: degrees, minutes and secons > 3) do some processing of the 3 varaibles > 4) print the output with easygui. > > I am not really interested which is the best encoding. I want to know: > * how I do this that I don't get a encoding error? > * how do I code it that the code runs on linux and windows > from file and in IPython I realise that I am running the risk of confusing you further, but I'm afraid that your attitude of "This isn't my problem; it's Python's" isn't really going to wash. If you're going to be using characters which fall outside the realm of 7-bit ASCII you're going to have to get some understanding of how the various input, output and language mechanisms deal with them. And all the more so if you're trying to do this cross-platform. Maybe there's some kind of sealed environment in some other language or operating system which takes care of all of this for you transparently. I wouldn't know. What I do know is that, if you're using the Python interpreter under Windows and Linux and whatever else then you're at the mercy of those operating systems at a certain level. There are at least two points you have to understand: 1) Python needs to know what encoding was used to save a text file which it is compiling to bytecode: usually a .py file. It has a default which you can override in a couple of ways. If whatever encoding you've specified turns out not to match the text in, say, a literal string with a degree symbol, then Python will not know what to do and will stop with an exception. Of course how you encoded the file in question is between you and your editor. 2) When you are reading or writing text to or from a console or GUI window or database or PDF or whatever, you also need to know what encoding to use. If you're writing out, then whatever you're writing to will be able to make sense of the encoding you're supplying -- and you may need to say which one it was. If you're reading in, you are at the mercy of libraries: some will always return unicode (BeautifulSoup springs to mind), others will return raw bytes leaving it up to you to decode, others will return an encoded string. This is pretty much an historical artefact (or, sadly in some cases, a case of ignorance) and you're going to have to cope with it. On my windows box, easygui handles unicode perfectly well, and the console running cp437 displays the degree sign. If if didn't, I'd have to compromise on the display (or use chcp to switch code pages first). To illustrate, the following program works: <code> import easygui sample = u"DEG\u00b0 MIN' SEC\"" from_user = easygui.enterbox (u"Enter" + sample) # # Paste in values from your email since I can't # be bothered to work out how to get the degree # sign # print from_user </code> and from_user is a perfectly good unicode string. Now, if you want to write that out to a file, or a database or what-have-you which can't store unicode natively, then you'll have to encode it, probably as UTF8 which can encode anything. For this email, I've used the unicode-escape, but if -- as you did -- you wanted to use the string literal, then you'd need to save the .py file in a certain encoding and to place a line at the top of the file indicating what that encoding was. If you're happy using unicode-escapes then that saves a bit of finnicking about. TJG _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor