Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Walter Dörwald
On 09.01.10 01:47, Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding=BOM is that if there

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Walter Dörwald
Victor Stinner wrote: Le vendredi 08 janvier 2010 10:10:23, Martin v. Löwis a écrit : Builtin open() function is unable to open an UTF-16/32 file starting with a BOM if the encoding is not specified (raise an unicode error). For an UTF-8 file starting with a BOM, read()/readline() returns also

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit : While I would support combining BOM detection in the case where a file is opened for reading and no encoding is specified, I see two problems: a) if a seek operations is performed before having looked at the BOM, no determination

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread M.-A. Lemburg
Victor Stinner wrote: (2) Check for a BOM while reading or detect it before? Everybody agree that checking BOM is an interesting option and should not be limited to open(). Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file name or a binary file-like object: it

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 01:47:38, vous avez écrit : One concern I have with this implementation encoding=BOM is that if there is no BOM it assumes UTF-8. If no BOM is found, it fallback to the current heuristic: os.device_encoding() or system local. (...) Hence, it might be that someone

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit : What about listing the possible encodings? It would try each in turn until it found one where the BOM matched or had no BOM: my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8') or is that taking it too far? Yes, you're

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 12:18:33, Walter Dörwald a écrit : Good idea, I choosed open(filename, encoding=BOM). On the surface this looks like there's an encoding named BOM, but looking at your patch I found that the check is still done in TextIOWrapper. IMHO the best approach would to the

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Hi, Le samedi 09 janvier 2010 13:45:58, vous avez écrit : Note: I implemented the BOM check in TextIOWrapper; so it's already usable for any file-like object. Yes, but the implementation is limited to just BOM checking and thus only supports UTF-8-SIG, UTF-16 and UTF-32. Sure, but that's

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Antoine Pitrou
Walter Dörwald walter at livinglogic.de writes: On the surface this looks like there's an encoding named BOM, but looking at your patch I found that the check is still done in TextIOWrapper. IMHO the best approach would to the implement a *real* codec named BOM (or sniff). This doesn't

[Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Benjamin Peterson
On behalf of the Python development team, I'm gleeful to announce the second alpha release of Python 2.7. Python 2.7 is scheduled to be the last major version in the 2.x series. It includes many features that were first released in Python 3.1. The faster io module, the new nested with statement

Re: [Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Karen Tracey
On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson benja...@python.orgwrote: On behalf of the Python development team, I'm gleeful to announce the second alpha release of Python 2.7. Well yay. Django's test suite (1242 tests) runs with just one failure on the 2.7 alpha 2 level, and that

Re: [Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Benjamin Peterson
2010/1/9 Karen Tracey kmtra...@gmail.com: On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson benja...@python.org wrote: On behalf of the Python development team, I'm gleeful to announce the second alpha release of Python 2.7. Well yay.  Django's test suite (1242 tests) runs with just one

[Python-Dev] Unladen cPickle speedups in 2.7 3.1

2010-01-09 Thread skip
How much of the Unladen Swallow cPickle speedups have been incorporated into 2.7 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the two versions I currently care about at work - we will skip 2.5 entirely). It appears some of their speedups may have already been merged to trunk,

Re: [Python-Dev] Unladen cPickle speedups in 2.7 3.1

2010-01-09 Thread skip
Philip They've documented their upstream patches here: Philip http://code.google.com/p/unladen-swallow/wiki/UpstreamPatches Thanks. That will help immensely. Skip ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Unladen cPickle speedups in 2.7 amp; 3.1

2010-01-09 Thread Antoine Pitrou
skip at pobox.com writes: If a patch to merge this to 2.7 is already under consideration I won't look at it, Why won't you look at it? :) Actually, if these patches are to be merged someone should certainly look at them, and do the (possibly) remaining work. http://bugs.python.org/issue5683

Re: [Python-Dev] Unladen cPickle speedups in 2.7 3.1

2010-01-09 Thread Philip Jenvey
On Jan 9, 2010, at 12:00 PM, s...@pobox.com wrote: How much of the Unladen Swallow cPickle speedups have been incorporated into 2.7 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the two versions I currently care about at work - we will skip 2.5 entirely). It appears some

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Martin v. Löwis
Antoine Pitrou wrote: Walter Dörwald walter at livinglogic.de writes: On the surface this looks like there's an encoding named BOM, but looking at your patch I found that the check is still done in TextIOWrapper. IMHO the best approach would to the implement a *real* codec named BOM (or

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Antoine Pitrou
Martin v. Löwis martin at v.loewis.de writes: Sorry but this is missing the point. The point here is to improve the open() function. I'm sure people who know about encodings are able to install the chardet library or even whip up their own BOM detection routine... How does the

Re: [Python-Dev] Unladen cPickle speedups in 2.7 amp; 3.1

2010-01-09 Thread skip
Antoine == Antoine Pitrou solip...@pitrou.net writes: Antoine skip at pobox.com writes: If a patch to merge this to 2.7 is already under consideration I won't look at it, Antoine Why won't you look at it? :) I meant I wouldn't look at developing one. Skip

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Lennart Regebro
On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou solip...@pitrou.net wrote: If we want it to be the default, it must be able to fallback on the current locale-based algorithm if no BOM is found. I don't think it would be easy for a codec to do that. Right. It seems like encoding=None is the

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Michael Foord
On 09/01/2010 22:14, Lennart Regebro wrote: On Sat, Jan 9, 2010 at 21:28, Antoine Pitrousolip...@pitrou.net wrote: If we want it to be the default, it must be able to fallback on the current locale-based algorithm if no BOM is found. I don't think it would be easy for a codec to do that.

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Martin v. Löwis
How does the requirement that it be implemented as a codec miss the point? If we want it to be the default, it must be able to fallback on the current locale-based algorithm if no BOM is found. I don't think it would be easy for a codec to do that. Yes - however, Victor currently