1.-both editor and shell use different type of encofing: sure

2.-both editor and shell throw the same answer: yes
>>> import sys
>>> sys.stdin.encoding
‘cp850'

3.-both editor and shell throw the same answer: yes
>>>import sys
>>>sys.getdefaultencoding()
'ascii'

4.-due to my lack of knowledge, I must be practical, and forget the 
environent encoding and go to the point directly

>>html_txt=urllib2.urlopen(url).read()
>>print html_txt.headers.getheader('Content-Type')
text/html; charset=iso-latin-1

ok, its encoding is iso-latin-1 (same as iso-8859-1) so, If I want my 
documents in UTF-8:
>>html_inPreferredEncoding=unicode(html_txt,'iso-8859-1').encode('utf-8')

the same as:
>>html_inPreferredEncoding=html_txt.unicode('iso-8859-1').encode('utf-8')


Reply via email to