OK this is actually starting to make sense :-) Here is what I think is 
happening:

You get different results in the IDE and the console because they are using 
different encodings. The IDE is using utf-8 so the params are encoded in utf-8. 
The console is using latin-1 and you get encoded latin-1 params.

When you use babelfish from the browser it gets a page in utf-8 and sends the 
parameters back the same way, but probably with a header saying it is utf-8. 
When you use urllib you don't tell it the encoding so it is assuming latin-1, 
that's why the interpreter version works.

So in your GUI version if you get utf-8 from the GUI, you can convert it to 
latin-1 by
phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed 
in latin-1. If you need utf-8 then you have to figure out how to tell babelfish 
that you are sending utf-8.

Kent

PS please reply to the list not to me personally.

Jorge Louis de Castro wrote:
> Thanks again,
> 
> I'm sorry to be such a PITB but this is driving me insane! the code 
> below easily connects to babelfish and returns a translated string.
> 
> __where = [ re.compile(r'name=\"q\">([^<]*)'),
>            re.compile(r'td bgcolor=white>([^<]*)'),
>            re.compile(r'td bgcolor=white class=s><div 
> style=padding:10px;>([^<]*)'),
>            re.compile(r'<\/strong><br>([^<]*)')
> 
> def clean(text):
>    return ' '.join(string.replace(text.strip(), "\n", ' ').split())
> 
> def translateByCode(phrase, from_code, to_code):
>    phrase = clean(phrase)
>    params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes',
>                                 'doit' : 'done',
>                                 'urltext' : phrase,
>                                 'lp' : from_code + '_' + to_code } )
>    print "URL encoding ", params
>    try:
>        response = 
> urllib.urlopen('http://world.altavista.com/babelfish/tr', params)
>    except IOError, what:
>        print "ERRROR TRANSLATING ", what
>    except:
>        print "Unexpected error:", sys.exc_info()[0]
> 
>    html = response.read()
>    for regex in __where:
>        match = regex.search(html)
>        if match: break
>    if not match: print "ERROR MATCHING"
>    return clean(match.group(1))
> 
> if __name__ == '__main__':
>    print translateByCode('então', 'pt', 'en')
> 
> If I run this through the Run option on the IDE I get the following output:
> 
> URL encoding  doit=done&urltext=ent%C3%A3o&BabelFishFrontPage=yes&lp=pt_en
> então
> então
> 
> If I import this module on the interpreter and then call
> 
> print translateByCode('então', 'en', 'pt')
> 
> I get:
> 
> URL encoding  doit=done&urltext=ent%E3o&BabelFishFrontPage=yes&lp=pt_en
> then
> then
> 
> Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS 
> "ent%E3o") even though I'm passing the same stuff!
> And this works fine except when I use special characters and I don't 
> know how to use the utf-8 encoding to get this working -i know altavista 
> uses utf-8 because they also translate chinese.
> 
> Thanks again and sorry for the blurb but i ran out of solutions for this 
> one.
> 
> 
> 

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to