Daniel Diniz added the comment:
As Victor notes, this is a controversial issue. And I'll add that the need for
this feature seems not to have been brought up up in over a decade. So I'm
closing this.
--
resolution: -> rejected
stage: patch review -> resolved
status: open -> closed
STINNER Victor added the comment:
This feature request seems to be controversial: there is no clear consensus on
which encoding should be used. I suggest to simply close the issue.
In the meanwhile, since this issue is far from being "newcomer friendly", I
remove the "Easy" label.
Changes by Martin Panter vadmium...@gmail.com:
--
nosy: +vadmium
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Python-bugs-list
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
versions: +Python 3.4 -Python 3.3
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Lino Mastrodomenico added the comment:
FYI, the exact algorithm for determining the encoding of HTML documents is
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding
There are lots of different algorithms documented all over the
Serhiy Storchaka storch...@gmail.com added the comment:
If you add the encoding parameter, you should also add at least errors and
newline parameters. And why not just use io.TextIOWrapper?
page.decode_content() bad that compels to read and to decode at once all of the
data, while
Éric Araujo mer...@netwok.org added the comment:
I’m not sure real HTML (i.e. sent as text/html) should have an XML prolog
honored. For XML, there’s http://tools.ietf.org/html/rfc3023
--
___
Python tracker rep...@bugs.python.org
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +ezio.melotti -BreamoreBoy
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Ezio Melotti ezio.melo...@gmail.com added the comment:
Christian Heimes wrote:
There is no generic and simple way to detect the encoding of a
remote site. Sometimes the encoding is mentioned in the HTTP header,
sometimes it's embedded in the head section of the HTML document.
FWIW for
Ezio Melotti ezio.melo...@gmail.com added the comment:
page.decode_content() might be a better name, and would avoid confusion with
the bytes.decode() method.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
Senthil Kumaran sent...@uthcode.com added the comment:
- page.encoding is a good idea.
- page.decode_content sounds definitely better than page.decode which can be
confusing as page is not a bytes object, but a file-like object.
I am thinking if an attribute to urlopen would be better? Not
Éric Araujo mer...@netwok.org added the comment:
I think the patch should be updated to benefit from new facilities in the io
module instead of monkey-patching methods. The doc and tests are still good.
--
nosy: +eric.araujo
___
Python tracker
Changes by Éric Araujo mer...@netwok.org:
--
dependencies: -urllib(2) should allow automatic decoding by charset
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
Changes by Senthil Kumaran orsent...@gmail.com:
--
assignee: - orsenthil
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Mark Lawrence breamore...@yahoo.co.uk added the comment:
Senthil: could you review the attached patch please?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
Changes by Terry J. Reedy tjre...@udel.edu:
--
versions: +Python 3.2 -Python 3.1
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Mark Lawrence breamore...@yahoo.co.uk added the comment:
Christian, Daniel, I take it that you're both still interested in this?
--
nosy: +BreamoreBoy
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
Changes by Lino Mastrodomenico l.mastrodomen...@gmail.com:
--
nosy: +mastrodomenico
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Changes by Daniel Diniz aja...@gmail.com:
--
keywords: +easy
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Python-bugs-list mailing
Changes by Daniel Diniz aja...@gmail.com:
--
nosy: +orsenthil
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Python-bugs-list mailing
Changes by Daniel Diniz aja...@gmail.com:
--
dependencies: +urllib(2) should allow automatic decoding by charset
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
New submission from Daniel Diniz aja...@gmail.com:
This patch adds a version of urlopen that uses available encoding
information to return strings instead of bytes.
The main goal is to provide a shortcut for users that don't want to
handle the decoding in the easy cases[1]. One added benefit it
Christian Heimes li...@cheimes.de added the comment:
Thx, I'll review the patch after Christmas.
--
nosy: +christian.heimes
priority: - normal
stage: - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
23 matches
Mail list logo