Re: save gb-2312 web page in a .html file

2007-12-26 Thread Peter Pei
I "view sourced" the original web page in IE7, and it does specify: So sounds like the encoding is gb2312... -- http://mail.python.org/mailman/listinfo/python-list

Re: save gb-2312 web page in a .html file

2007-12-26 Thread Martin v. Löwis
> .read() returns the bytes exactly how it downloads them. It doesn't > interpret them. If those bytes are GB-2312-encoded text, that's what > they are. There's no need to reencode them. Just .write(page) (of > course, this way you don't verify that it's correct). Alternatively, if the page is *no

Re: save gb-2312 web page in a .html file

2007-12-26 Thread Matt Nordhoff
Peter Pei wrote: > You must be right, since I tried one page and it worked. But there is > something wrong with this particular page: > http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with > IE7), it is all messed up. > > url = 'http://overseas.btchina.net/?categoryi

Re: save gb-2312 web page in a .html file

2007-12-26 Thread Peter Pei
You must be right, since I tried one page and it worked. But there is something wrong with this particular page: http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with IE7), it is all messed up. url = 'http://overseas.btchina.net/?categoryid=-1' headers = { 'User-A

Re: save gb-2312 web page in a .html file

2007-12-26 Thread Matt Nordhoff
Peter Pei wrote: > I am trying to read a web page and save it in a .html file. The problem is > that the web page is GB-2312 encoded, and I want to save it to the file with > the same encoding or unicode. I have some code like this: > url = 'http://blah/' > headers = { 'User-Agent' : 'Moz

save gb-2312 web page in a .html file

2007-12-26 Thread Peter Pei
I am trying to read a web page and save it in a .html file. The problem is that the web page is GB-2312 encoded, and I want to save it to the file with the same encoding or unicode. I have some code like this: url = 'http://blah/' headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE