Re: git archive --format zip utf-8 issues

2012-09-24 Thread René Scharfe
Hi, I found a way to make unzip respect the UTF-8 flag in ZIP files: Apparently (from looking at the source) an extended field needs to be present in order for it to even look at general purpose flag 11. I sent a patch to add an extended timestamp field that fits the bill. Here are new

Re: git archive --format zip utf-8 issues

2012-09-20 Thread René Scharfe
Am 18.09.2012 23:12, schrieb Junio C Hamano: René Scharfe rene.scha...@lsrfire.ath.cx writes: WindowsInfo-ZIP unzip 7-Zip PeaZip builtin Linux msysgit Windows 7-Zip 9.20 0 0 4626

Re: git archive --format zip utf-8 issues

2012-09-18 Thread René Scharfe
Hello again, so two weeks have passed, and I've moved at a glacial pace towards a method how to measure compatibility of our generated ZIP files. Sorry, I just keep getting distracted. Anyway, the idea is to have a bunch of files with names using different scripts, zip them with several

Re: git archive --format zip utf-8 issues

2012-09-05 Thread René Scharfe
Am 04.09.2012 23:03, schrieb Junio C Hamano: René Scharfe rene.scha...@lsrfire.ath.cx writes: + if (has_non_ascii(path)) { Do we want to treat \033 as ascii in this codepath? The function primarily is used by the log formatter to see if we need 8-bit CTE when writing out in the e-mail

Re: git archive --format zip utf-8 issues

2012-09-04 Thread René Scharfe
Am 31.08.2012 00:26, schrieb Jeff King: Ping on this stalled discussion. Sorry, I got distracted by other stuff again. I did some experiments, though, and here's a preliminary result. It seems like there are two separate issues here: 1. Knowing the encoding of pathnames in the

Re: git archive --format zip utf-8 issues

2012-09-04 Thread Junio C Hamano
René Scharfe rene.scha...@lsrfire.ath.cx writes: But now for the patch, which is a bit confusing as well. I'm curious to hear about results for more platforms, extractors and character classes. Based on that we can see if we need to generate the extra fields instead of relying on the new

Re: git archive --format zip utf-8 issues

2012-08-30 Thread Jeff King
On Sat, Aug 11, 2012 at 11:37:05PM +0200, Sven Strickroth wrote: Am 11.08.2012 22:53 schrieb René Scharfe: The standard says we need to convert to CP437, or to UTF-8, or provide both versions. A more interesting question is: What's supported by which programs? The ZIP functionality

Re: git archive --format zip utf-8 issues

2012-08-11 Thread René Scharfe
Am 11.08.2012 00:47, schrieb Junio C Hamano: Sven Strickroth sven.strickr...@tu-clausthal.de writes: when I create a git repository, add a file containing utf-8 characters or umlauts (like öäü.txt), commit and then export the HEAD revision to a zip archive using git archive --format zip -o

Re: git archive --format zip utf-8 issues

2012-08-11 Thread René Scharfe
Am 11.08.2012 01:53, schrieb Sven Strickroth: Am 11.08.2012 00:47 schrieb Junio C Hamano: Do you know in what encoding the pathnames are _expected_ to be stored in zip archives? re-encoding to latin1 does not always work and may break double byte totally (e.g. chinese or japanese). PKZIP

Re: git archive --format zip utf-8 issues

2012-08-11 Thread Sven Strickroth
Am 11.08.2012 22:53 schrieb René Scharfe: The standard says we need to convert to CP437, or to UTF-8, or provide both versions. A more interesting question is: What's supported by which programs? The ZIP functionality built into Windows 7 doesn't seem to work with UTF-8 encoded filenames

Re: git archive --format zip utf-8 issues

2012-08-11 Thread Junio C Hamano
René Scharfe rene.scha...@lsrfire.ath.cx writes: PKZIP APPNOTE seems to be the zip standard and it specifies a utf-8 flag: http://www.pkware.com/documents/casestudies/APPNOTE.TXT A. Local file header: general purpose bit flag: (2 bytes) Bit 11: Language encoding flag (EFS). If this bit is

Re: git archive --format zip utf-8 issues

2012-08-11 Thread Junio C Hamano
René Scharfe rene.scha...@lsrfire.ath.cx writes: ... A more interesting question is: What's supported by which programs? Yes, that is the most interesting question. Of course, git archive --format=zip --path-reencode=utf8-to-latin1 would be the most generic way to do this. I really hope

git archive --format zip utf-8 issues

2012-08-10 Thread Sven Strickroth
Hi, when I create a git repository, add a file containing utf-8 characters or umlauts (like öäü.txt), commit and then export the HEAD revision to a zip archive using git archive --format zip -o 1.zip HEAD, the zip file contains incorrect filenames: $ unzip -l 1.zip Archive: 1.zip

Re: git archive --format zip utf-8 issues

2012-08-10 Thread Junio C Hamano
Sven Strickroth sven.strickr...@tu-clausthal.de writes: when I create a git repository, add a file containing utf-8 characters or umlauts (like öäü.txt), commit and then export the HEAD revision to a zip archive using git archive --format zip -o 1.zip HEAD, the zip file contains incorrect

Re: git archive --format zip utf-8 issues

2012-08-10 Thread Sven Strickroth
Am 11.08.2012 00:47 schrieb Junio C Hamano: Do you know in what encoding the pathnames are _expected_ to be stored in zip archives? re-encoding to latin1 does not always work and may break double byte totally (e.g. chinese or japanese). PKZIP APPNOTE seems to be the zip standard and it