On 09.01.10 01:47, Glenn Linderman wrote:
On approximately 1/8/2010 3:59 PM, came the following characters from
the keyboard of Victor Stinner:
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
One concern I have with this implementation encoding=BOM is that if
there
Victor Stinner wrote:
Le vendredi 08 janvier 2010 10:10:23, Martin v. Löwis a écrit :
Builtin open() function is unable to open an UTF-16/32 file starting with
a BOM if the encoding is not specified (raise an unicode error). For an
UTF-8 file starting with a BOM, read()/readline() returns also
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit :
While I would support combining BOM detection in the case where a file
is opened for reading and no encoding is specified, I see two problems:
a) if a seek operations is performed before having looked at the BOM,
no determination
Victor Stinner wrote:
(2) Check for a BOM while reading or detect it before?
Everybody agree that checking BOM is an interesting option and should not be
limited to open().
Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file
name or a binary file-like object: it
Le samedi 09 janvier 2010 01:47:38, vous avez écrit :
One concern I have with this implementation encoding=BOM is that if
there is no BOM it assumes UTF-8.
If no BOM is found, it fallback to the current heuristic: os.device_encoding()
or system local.
(...) Hence, it might be that someone
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit :
What about listing the possible encodings? It would try each in turn
until it found one where the BOM matched or had no BOM:
my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8')
or is that taking it too far?
Yes, you're
Le samedi 09 janvier 2010 12:18:33, Walter Dörwald a écrit :
Good idea, I choosed open(filename, encoding=BOM).
On the surface this looks like there's an encoding named BOM, but
looking at your patch I found that the check is still done in
TextIOWrapper. IMHO the best approach would to the
Hi,
Le samedi 09 janvier 2010 13:45:58, vous avez écrit :
Note: I implemented the BOM check in TextIOWrapper; so it's already
usable for any file-like object.
Yes, but the implementation is limited to just BOM checking
and thus only supports UTF-8-SIG, UTF-16 and UTF-32.
Sure, but that's
Walter Dörwald walter at livinglogic.de writes:
On the surface this looks like there's an encoding named BOM, but
looking at your patch I found that the check is still done in
TextIOWrapper. IMHO the best approach would to the implement a *real*
codec named BOM (or sniff). This doesn't
On behalf of the Python development team, I'm gleeful to announce the second
alpha release of Python 2.7.
Python 2.7 is scheduled to be the last major version in the 2.x series. It
includes many features that were first released in Python 3.1. The faster io
module, the new nested with statement
On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson benja...@python.orgwrote:
On behalf of the Python development team, I'm gleeful to announce the
second
alpha release of Python 2.7.
Well yay. Django's test suite (1242 tests) runs with just one failure on
the 2.7 alpha 2 level, and that
2010/1/9 Karen Tracey kmtra...@gmail.com:
On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson benja...@python.org
wrote:
On behalf of the Python development team, I'm gleeful to announce the
second
alpha release of Python 2.7.
Well yay. Django's test suite (1242 tests) runs with just one
How much of the Unladen Swallow cPickle speedups have been incorporated into
2.7 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the
two versions I currently care about at work - we will skip 2.5 entirely).
It appears some of their speedups may have already been merged to trunk,
Philip They've documented their upstream patches here:
Philip http://code.google.com/p/unladen-swallow/wiki/UpstreamPatches
Thanks. That will help immensely.
Skip
___
Python-Dev mailing list
Python-Dev@python.org
skip at pobox.com writes:
If a patch to merge this to 2.7 is already under
consideration I won't look at it,
Why won't you look at it? :)
Actually, if these patches are to be merged someone should certainly look at
them, and do the (possibly) remaining work.
http://bugs.python.org/issue5683
On Jan 9, 2010, at 12:00 PM, s...@pobox.com wrote:
How much of the Unladen Swallow cPickle speedups have been incorporated into
2.7 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the
two versions I currently care about at work - we will skip 2.5 entirely).
It appears some
Antoine Pitrou wrote:
Walter Dörwald walter at livinglogic.de writes:
On the surface this looks like there's an encoding named BOM, but
looking at your patch I found that the check is still done in
TextIOWrapper. IMHO the best approach would to the implement a *real*
codec named BOM (or
Martin v. Löwis martin at v.loewis.de writes:
Sorry but this is missing the point. The point here is to improve the open()
function. I'm sure people who know about encodings are able to install the
chardet library or even whip up their own BOM detection routine...
How does the
Antoine == Antoine Pitrou solip...@pitrou.net writes:
Antoine skip at pobox.com writes:
If a patch to merge this to 2.7 is already under
consideration I won't look at it,
Antoine Why won't you look at it? :)
I meant I wouldn't look at developing one.
Skip
On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou solip...@pitrou.net wrote:
If we want it to be the default, it must be able to fallback on the current
locale-based algorithm if no BOM is found. I don't think it would be easy for
a
codec to do that.
Right. It seems like encoding=None is the
On 09/01/2010 22:14, Lennart Regebro wrote:
On Sat, Jan 9, 2010 at 21:28, Antoine Pitrousolip...@pitrou.net wrote:
If we want it to be the default, it must be able to fallback on the current
locale-based algorithm if no BOM is found. I don't think it would be easy for a
codec to do that.
How does the requirement that it be implemented as a codec miss the
point?
If we want it to be the default, it must be able to fallback on the current
locale-based algorithm if no BOM is found. I don't think it would be easy for
a
codec to do that.
Yes - however, Victor currently
22 matches
Mail list logo