I "view sourced" the original web page in IE7, and it does specify:
So sounds like the encoding is gb2312...
--
http://mail.python.org/mailman/listinfo/python-list
> .read() returns the bytes exactly how it downloads them. It doesn't
> interpret them. If those bytes are GB-2312-encoded text, that's what
> they are. There's no need to reencode them. Just .write(page) (of
> course, this way you don't verify that it's correct).
Alternatively, if the page is *no
Peter Pei wrote:
> You must be right, since I tried one page and it worked. But there is
> something wrong with this particular page:
> http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with
> IE7), it is all messed up.
>
> url = 'http://overseas.btchina.net/?categoryi
You must be right, since I tried one page and it worked. But there is
something wrong with this particular page:
http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with
IE7), it is all messed up.
url = 'http://overseas.btchina.net/?categoryid=-1'
headers = { 'User-A
Peter Pei wrote:
> I am trying to read a web page and save it in a .html file. The problem is
> that the web page is GB-2312 encoded, and I want to save it to the file with
> the same encoding or unicode. I have some code like this:
> url = 'http://blah/'
> headers = { 'User-Agent' : 'Moz
I am trying to read a web page and save it in a .html file. The problem is
that the web page is GB-2312 encoded, and I want to save it to the file with
the same encoding or unicode. I have some code like this:
url = 'http://blah/'
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE