[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2021-12-10 Thread Daniel Diniz
Daniel Diniz added the comment: As Victor notes, this is a controversial issue. And I'll add that the need for this feature seems not to have been brought up up in over a decade. So I'm closing this. -- resolution: -> rejected stage: patch review -> resolved status: open -> closed

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2019-07-29 Thread STINNER Victor
STINNER Victor added the comment: This feature request seems to be controversial: there is no clear consensus on which encoding should be used. I suggest to simply close the issue. In the meanwhile, since this issue is far from being "newcomer friendly", I remove the "Easy" label. -

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2014-08-31 Thread Martin Panter
Changes by Martin Panter : -- nosy: +vadmium ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2012-09-26 Thread Lino Mastrodomenico
Lino Mastrodomenico added the comment: FYI, the exact algorithm for determining the encoding of HTML documents is http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding There are lots of different algorithms documented all over the intertubes

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2012-09-26 Thread Ezio Melotti
Changes by Ezio Melotti : -- versions: +Python 3.4 -Python 3.3 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: h

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2012-05-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: If you add the encoding parameter, you should also add at least errors and newline parameters. And why not just use io.TextIOWrapper? page.decode_content() bad that compels to read and to decode at once all of the data, while io.TextIOWrapper returns a file

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2011-10-21 Thread Éric Araujo
Éric Araujo added the comment: I’m not sure real HTML (i.e. sent as text/html) should have an XML prolog honored. For XML, there’s http://tools.ietf.org/html/rfc3023 -- ___ Python tracker

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2011-10-19 Thread Senthil Kumaran
Senthil Kumaran added the comment: - page.encoding is a good idea. - page.decode_content sounds definitely better than page.decode which can be confusing as page is not a bytes object, but a file-like object. I am thinking if an attribute to urlopen would be better? Not exactly the mode like

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Ezio Melotti added the comment: page.decode_content() might be a better name, and would avoid confusion with the bytes.decode() method. -- ___ Python tracker ___ ___

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Ezio Melotti added the comment: > Christian Heimes wrote: > There is no generic and simple way to detect the encoding of a > remote site. Sometimes the encoding is mentioned in the HTTP header, > sometimes it's embedded in the section of the HTML document. FWIW for HTML pages the encodin

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2011-10-19 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti -BreamoreBoy ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: ht

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-11-17 Thread Éric Araujo
Changes by Éric Araujo : -- dependencies: -urllib(2) should allow automatic decoding by charset ___ Python tracker ___ ___ Python-bugs

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-11-17 Thread Éric Araujo
Éric Araujo added the comment: I think the patch should be updated to benefit from new facilities in the io module instead of monkey-patching methods. The doc and tests are still good. -- nosy: +eric.araujo ___ Python tracker

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-10-18 Thread Senthil Kumaran
Changes by Senthil Kumaran : -- assignee: -> orsenthil ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-09-12 Thread Mark Lawrence
Mark Lawrence added the comment: Senthil: could you review the attached patch please? -- ___ Python tracker ___ ___ Python-bugs-list m

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-08-08 Thread Terry J. Reedy
Changes by Terry J. Reedy : -- versions: +Python 3.2 -Python 3.1 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-07-19 Thread Mark Lawrence
Mark Lawrence added the comment: Christian, Daniel, I take it that you're both still interested in this? -- nosy: +BreamoreBoy ___ Python tracker ___

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2010-01-27 Thread Lino Mastrodomenico
Changes by Lino Mastrodomenico : -- nosy: +mastrodomenico ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2009-04-22 Thread Daniel Diniz
Changes by Daniel Diniz : -- keywords: +easy ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2009-02-12 Thread Daniel Diniz
Changes by Daniel Diniz : -- dependencies: +urllib(2) should allow automatic decoding by charset ___ Python tracker ___ ___ Python-bugs

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2009-02-12 Thread Daniel Diniz
Changes by Daniel Diniz : -- nosy: +orsenthil ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2008-12-23 Thread Christian Heimes
Christian Heimes added the comment: Thx, I'll review the patch after Christmas. -- nosy: +christian.heimes priority: -> normal stage: -> patch review ___ Python tracker ___ ___

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

2008-12-23 Thread Daniel Diniz
New submission from Daniel Diniz : This patch adds a version of urlopen that uses available encoding information to return strings instead of bytes. The main goal is to provide a shortcut for users that don't want to handle the decoding in the easy cases[1]. One added benefit it that the failure