[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-11 Thread STINNER Victor
Changes by STINNER Victor : -- resolution: -> fixed status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-11 Thread STINNER Victor
STINNER Victor added the comment: Ok. I commited the patch to set the default encoding to utf-8 on Windows: r81925. -- ___ Python tracker ___ ___

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread STINNER Victor
STINNER Victor added the comment: Updated version of the utf-8 patch: - Use also UTF-8 for Windows CE - Update the documentation - Prepare the NEWS entry -- Added file: http://bugs.python.org/file17609/tarfile_windows_utf8-2.patch ___ Python track

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: FWIW, I agree with Lars: the main use of tar files under Windows is when they come from other systems. Windows users almost never generate tar files by themselves; they will generate zip, rar or 7z files instead. -- nosy: +pitrou

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread STINNER Victor
STINNER Victor added the comment: > 2. Create backups for personal use. What? Really? I'm sure that all Windows users will use ZIP or maybe RAR, but never the geek choice. > 1. Download tar archives from a webpage (when no zip is supplied) for viewing > or extracting. Tarballs come from UNI

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread Lars Gustäbel
Lars Gustäbel added the comment: Maybe I'm going out on a limb here, but I think we should again consider what tarfile users on Windows(!) actually use it for under which circumstances. The following list is probably not exhaustive, but IMHO covers 90%: 1. Download tar archives from a webpage

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread Martin v . Löwis
Martin v. Löwis added the comment: >> 7-zip encodes "à" (U+00e0) as 0x85 (1 byte), and "é" (U+00e9) as 0x82 (1 >> byte). I don't know this encoding. > > That's an old DOS code paged used in Europe: CP850 There is a good chance that they use it because it is the OEM code page on the system. I

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > My tests with 7-zip and WinRAR conviced me that it's not a good idea to use > utf-8 *by default* on Windows. But since mbcs doesn't support surrogateescape > error handler, we should restor

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread STINNER Victor
STINNER Victor added the comment: My tests with 7-zip and WinRAR conviced me that it's not a good idea to use utf-8 *by default* on Windows. But since mbcs doesn't support surrogateescape error handler, we should restore the previous behaviour just for this encoding. tarfile_mbcs_errors.patch

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-10 Thread STINNER Victor
STINNER Victor added the comment: I created a tarball (.tar.gz) on Windows with Python 3.1 (which uses "mbcs" encoding). With locale.getpreferredencoding() == 'cp1252', "é" (U+00e9) is encoded 0xe9 (1 byte) and "à" (U+00e0) as 0xe0 (1 byte). WinRAR displays correctly the file names, but 7-zip

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > STINNER Victor wrote: >> >> STINNER Victor added the comment: >> >> I created a TAR archive with the 7-zip archiver of file with diacritics in >> their name (eg. "é" and "à"). Then

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > I created a TAR archive with the 7-zip archiver of file with diacritics in > their name (eg. "é" and "à"). Then I opened the archive with WinRAR: the file > names were not displayed correct

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-06-09 Thread STINNER Victor
STINNER Victor added the comment: I created a TAR archive with the 7-zip archiver of file with diacritics in their name (eg. "é" and "à"). Then I opened the archive with WinRAR: the file names were not displayed correctly :-/ 7-zip encodes "à" (U+00e0) as 0x85 (1 byte), and "é" (U+00e9) as 0x

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-05-30 Thread Lars Gustäbel
Lars Gustäbel added the comment: My expertise on Windows is rather limited, but as far as I understand the issue, I consider this a reasonable idea. I think it is impossible to find a perfect default encoding, and utf-8 seems to be the best bet with regard to portability. IIRC most of the arch

[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

2010-05-21 Thread STINNER Victor
New submission from STINNER Victor : mbcs encoding replace non encodable characters (loose information) and doesn't support surrogateescape error handler. It ignores the error handler argument: see #850997, and tarfile now uses surrogateescape error handler by default (#8390). This encoding is