OK this is actually starting to make sense :-) Here is what I think is happening:
You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params. When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works. So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8. Kent PS please reply to the list not to me personally. Jorge Louis de Castro wrote: > Thanks again, > > I'm sorry to be such a PITB but this is driving me insane! the code > below easily connects to babelfish and returns a translated string. > > __where = [ re.compile(r'name=\"q\">([^<]*)'), > re.compile(r'td bgcolor=white>([^<]*)'), > re.compile(r'td bgcolor=white class=s><div > style=padding:10px;>([^<]*)'), > re.compile(r'<\/strong><br>([^<]*)') > > def clean(text): > return ' '.join(string.replace(text.strip(), "\n", ' ').split()) > > def translateByCode(phrase, from_code, to_code): > phrase = clean(phrase) > params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes', > 'doit' : 'done', > 'urltext' : phrase, > 'lp' : from_code + '_' + to_code } ) > print "URL encoding ", params > try: > response = > urllib.urlopen('http://world.altavista.com/babelfish/tr', params) > except IOError, what: > print "ERRROR TRANSLATING ", what > except: > print "Unexpected error:", sys.exc_info()[0] > > html = response.read() > for regex in __where: > match = regex.search(html) > if match: break > if not match: print "ERROR MATCHING" > return clean(match.group(1)) > > if __name__ == '__main__': > print translateByCode('então', 'pt', 'en') > > If I run this through the Run option on the IDE I get the following output: > > URL encoding doit=done&urltext=ent%C3%A3o&BabelFishFrontPage=yes&lp=pt_en > então > então > > If I import this module on the interpreter and then call > > print translateByCode('então', 'en', 'pt') > > I get: > > URL encoding doit=done&urltext=ent%E3o&BabelFishFrontPage=yes&lp=pt_en > then > then > > Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS > "ent%E3o") even though I'm passing the same stuff! > And this works fine except when I use special characters and I don't > know how to use the utf-8 encoding to get this working -i know altavista > uses utf-8 because they also translate chinese. > > Thanks again and sorry for the blurb but i ran out of solutions for this > one. > > > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor