Jorge Louis de Castro wrote: > Hi, thanks for the reply. > > > However, I get strange behavior when I try to feed text that must be > unicode to altavista for translation. > Just before sending, I've got the following on the log using > > print "RECV DATA: ", repr(data) > > and after entering "então" ("so" in Portuguese) > > RECV DATA: 'right: ent\xc3\xa3o?'
OK the data from Tkinter seems to be in utf-8 already; it is not a unicode string (no u' in the repr) and \xc3\xa3 is the utf-8 representation of a-tilde. > Now right before sending the data to be translated by altavista I print > out from the CONTENT[1] which yields: > > Translating: então? You have done an HTML entity escape on the data somewhere maybe? I don't know where this might be coming from, it's pretty mangled. There must be another text transformation in there somewhere. > > Which I find odd. Obvisouly, feeding this into babelfish results in a > failed translation. So before sending I try to encode it like you suggest. > > try: > print "Translating: ", content[1] > decoded = content[1].encode('utf8') > print "Decoding Prior to Translating: ", decoded > except Exception, e: > print "EXCEPTION ENCODING ", e > > The Exception thrown is: > > EXCEPTION ENCODING 'ascii' codec can't decode byte 0xc3 in position 4: > ordinal > not in range(128) > > > I was dealing w/ a Ascii string and was asking it to be encoded in UTF, > whereas Python is telling me he can't encode it in UTF?? Makes little > sense to me. This is a confusing error. What happens is, if you have a non-unicode string and you try to encode it, Python first converts it to a unicode string using the default codec which is ascii. This conversion fails because the string has non-ascii characters in it. Since you already have utf-8 this step is not needed. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor