Dear Cam, Python 3 is so much better at dealing with unicode than Python 2.
But, that said. Your file is in an encoding that is not latin-1 (which is basically an anglo encoding, no good if your text has inflections/accents). Solution: 1. Open your text file in a browser 2. If the file displays ok in the browser, see what encoding the browser used to decode the file: there is usually a "Encoding" option in the menu somewhere, e.g. in Chrome, under the View menu. Assume for this example that it is iso-8859-1 3. Change your file opening to: F = codecs.open('temp.txt', encoding=iso-8859-1') That should fix it. you can read from the file directly as a unicode string. Mick On 22 March 2014 03:26, Cam Farnell <ms...@bitflipper.ca> wrote: > Technically this is a Python question, not a Tkinter question, but it's in > the context of a Tkinter application so I don't feel *too* guilty about > posting it here. > > OK. I've got at Tkinter application (running with Python 2.7.2 on Ubuntu > 12.04.4 LTS) that needs to handle French accented characters. And it does > handle accented characters just fine. I can type an accented character into > an Entry and it shows up correctly. I can display it on a Text. I can > cPickle it to disk and read it back. For example, if I enter e-circumflex > (in at Tkinter Entry) and then print it using repr I get: > > u\'EA' > > If I look in the cPickled file there are 0xEA's where the e-circumflex > characters are. So far so good. > > The problem comes when I need to read into my Tkinter application a file > which has accented characters and which was prepared using a text editor > like, for example, gedit. The file to be read also has 0xEA's to represent > e-circumflex. However, when I read such a file the resulting string then > contains u'\cd\xaa' where the e-circumflexes belong. I don't know who is > doing the unwanted conversion or how to make it go away. I've tried reading > in binary mode, I've tried opening the file using: > > F = codecs.open('temp.txt', encoding='latin-1') > > I've tried putting: > > # -*- coding: latin-1 -* > > as the second line of my program. I've tried reading Python/unicode > documentation till my eyes went blurry. All to no avail. > > There is probably some really simple solution to this, but so far I've > failed to find. it. > > Thus, if anyone out there in Tkinter land knows the simple solution or could > point me to a good source of information I would greatly appreciate it. > > Thanks > > Cam Farnell > > _______________________________________________ > Tkinter-discuss mailing list > Tkinter-discuss@python.org > https://mail.python.org/mailman/listinfo/tkinter-discuss _______________________________________________ Tkinter-discuss mailing list Tkinter-discuss@python.org https://mail.python.org/mailman/listinfo/tkinter-discuss