[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2021-12-10 Thread Daniel Diniz
Daniel Diniz added the comment: As Victor notes, this is a controversial issue. And I'll add that the need for this feature seems not to have been brought up up in over a decade. So I'm closing this. -- resolution: -> rejected stage: patch review -> resolved status: open -> closed

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2019-07-29 Thread STINNER Victor
STINNER Victor added the comment: This feature request seems to be controversial: there is no clear consensus on which encoding should be used. I suggest to simply close the issue. In the meanwhile, since this issue is far from being "newcomer friendly", I remove the "Easy" label.

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2014-08-31 Thread Martin Panter
Changes by Martin Panter vadmium...@gmail.com: -- nosy: +vadmium ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___ Python-bugs-list

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2012-09-26 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- versions: +Python 3.4 -Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2012-09-26 Thread Lino Mastrodomenico
Lino Mastrodomenico added the comment: FYI, the exact algorithm for determining the encoding of HTML documents is http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding There are lots of different algorithms documented all over the

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2012-05-28 Thread Serhiy Storchaka
Serhiy Storchaka storch...@gmail.com added the comment: If you add the encoding parameter, you should also add at least errors and newline parameters. And why not just use io.TextIOWrapper? page.decode_content() bad that compels to read and to decode at once all of the data, while

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2011-10-21 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I’m not sure real HTML (i.e. sent as text/html) should have an XML prolog honored. For XML, there’s http://tools.ietf.org/html/rfc3023 -- ___ Python tracker rep...@bugs.python.org

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti -BreamoreBoy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Christian Heimes wrote: There is no generic and simple way to detect the encoding of a remote site. Sometimes the encoding is mentioned in the HTTP header, sometimes it's embedded in the head section of the HTML document. FWIW for

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: page.decode_content() might be a better name, and would avoid confusion with the bytes.decode() method. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2011-10-19 Thread Senthil Kumaran
Senthil Kumaran sent...@uthcode.com added the comment: - page.encoding is a good idea. - page.decode_content sounds definitely better than page.decode which can be confusing as page is not a bytes object, but a file-like object. I am thinking if an attribute to urlopen would be better? Not

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-11-17 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I think the patch should be updated to benefit from new facilities in the io module instead of monkey-patching methods. The doc and tests are still good. -- nosy: +eric.araujo ___ Python tracker

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-11-17 Thread Éric Araujo
Changes by Éric Araujo mer...@netwok.org: -- dependencies: -urllib(2) should allow automatic decoding by charset ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-10-18 Thread Senthil Kumaran
Changes by Senthil Kumaran orsent...@gmail.com: -- assignee: - orsenthil ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-09-12 Thread Mark Lawrence
Mark Lawrence breamore...@yahoo.co.uk added the comment: Senthil: could you review the attached patch please? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-08-08 Thread Terry J. Reedy
Changes by Terry J. Reedy tjre...@udel.edu: -- versions: +Python 3.2 -Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-07-19 Thread Mark Lawrence
Mark Lawrence breamore...@yahoo.co.uk added the comment: Christian, Daniel, I take it that you're both still interested in this? -- nosy: +BreamoreBoy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2010-01-27 Thread Lino Mastrodomenico
Changes by Lino Mastrodomenico l.mastrodomen...@gmail.com: -- nosy: +mastrodomenico ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2009-04-22 Thread Daniel Diniz
Changes by Daniel Diniz aja...@gmail.com: -- keywords: +easy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___ Python-bugs-list mailing

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2009-02-12 Thread Daniel Diniz
Changes by Daniel Diniz aja...@gmail.com: -- nosy: +orsenthil ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___ Python-bugs-list mailing

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2009-02-12 Thread Daniel Diniz
Changes by Daniel Diniz aja...@gmail.com: -- dependencies: +urllib(2) should allow automatic decoding by charset ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2008-12-23 Thread Daniel Diniz
New submission from Daniel Diniz aja...@gmail.com: This patch adds a version of urlopen that uses available encoding information to return strings instead of bytes. The main goal is to provide a shortcut for users that don't want to handle the decoding in the easy cases[1]. One added benefit it

[issue4733] Add a decode to declared encoding version of urlopen to urllib

2008-12-23 Thread Christian Heimes
Christian Heimes li...@cheimes.de added the comment: Thx, I'll review the patch after Christmas. -- nosy: +christian.heimes priority: - normal stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733