Re: [PATCH v8 5/7] convert: add 'working-tree-encoding' attribute
On Sat, Feb 24, 2018 at 11:27 AM,wrote: > Git recognizes files encoded with ASCII or one of its supersets (e.g. > UTF-8 or ISO-8859-1) as text files. All other encodings are usually > interpreted as binary and consequently built-in Git text processing > tools (e.g. 'git diff') as well as most Git web front ends do not > visualize the content. > > Add an attribute to tell Git what encoding the user has defined for a > given file. If the content is added to the index, then Git converts the > content to a canonical UTF-8 representation. On checkout Git will > reverse the conversion. > > Signed-off-by: Lars Schneider > --- > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > @@ -272,6 +272,84 @@ few exceptions. Even though... > +Please note that using the `working-tree-encoding` attribute may have a > +number of pitfalls: > + > +- Alternative Git implementations (e.g. JGit or libgit2) and older Git > + versions (as of March 2018) do not support the `working-tree-encoding` > + attribute. If you decide to use the `working-tree-encoding` attribute > + in your repository, then it is strongly recommended to ensure that all > + clients working with the repository support it. > + > + If you declare `*.proj` files as UTF-16 and you add `foo.proj` with an > + `working-tree-encoding` enabled Git client, then `foo.proj` will be > + stored as UTF-8 internally. A client without `working-tree-encoding` > + support will checkout `foo.proj` as UTF-8 encoded file. This will > + typically cause trouble for the users of this file. The above paragraph is giving an example of the scenario described in the paragraph above it. It might, therefore, be clearer to start this paragraph with "For example,...". Also, as an aid to non-Windows developers, it might be helpful to introduce ".proj" files as "Microsoft Visual Studio `*.proj` files...". > diff --git a/convert.c b/convert.c > @@ -265,6 +266,110 @@ static int will_convert_lf_to_crlf(size_t len, struct > text_stat *stats, > +static struct encoding { > + const char *name; > + struct encoding *next; > +} *encoding, **encoding_tail; Why does this struct need to be a linked-list since it contains only a name? In fact, why do the struct and linked list exist at all? Back in v1 when the struct held more members, it made sense to place them in a collection so they could be re-used, but today it just seems an unnecessary artifact which buys you nothing. Is the plan that some future patch series will add members to the struct? If not, then it seems that it should be retired altogether. > +static const char *default_encoding = "UTF-8"; > + > +static int encode_to_git(const char *path, const char *src, size_t src_len, > +struct strbuf *buf, struct encoding *enc, int > conv_flags) > +{ > + [...] > + if (has_prohibited_utf_bom(enc->name, src, src_len)) { > + const char *error_msg = _( > + "BOM is prohibited in '%s' if encoded as %s"); > + /* > +* This advise is shown for UTF-??BE and UTF-??LE encodings. s/advise/advice/ > +* We truncate the encoding name to 6 chars with %.6s to cut > +* off the last two "byte order" characters. > +*/ > + const char *advise_msg = _( > + "The file '%s' contains a byte order mark (BOM). " > + "Please use %.6s as working-tree-encoding."); > + advise(advise_msg, path, enc->name); > + if (conv_flags & CONV_WRITE_OBJECT) > + die(error_msg, path, enc->name); > + else > + error(error_msg, path, enc->name); What is the intended behavior in the non-die() case? As implemented, this is still going to attempt the conversion even though it threw an error. Should it be returning 0 here instead? Same question for the couple additional cases below. > + > + } else if (is_missing_required_utf_bom(enc->name, src, src_len)) { > + const char *error_msg = _( > + "BOM is required in '%s' if encoded as %s"); > + const char *advise_msg = _( > + "The file '%s' is missing a byte order mark (BOM). " > + "Please use %sBE or %sLE (depending on the byte > order) " > + "as working-tree-encoding."); > + advise(advise_msg, path, enc->name, enc->name); > + if (conv_flags & CONV_WRITE_OBJECT) > + die(error_msg, path, enc->name); > + else > + error(error_msg, path, enc->name); > + } For a function which presumably is agnostic about the working-tree encoding (supporting whatever iconv does), it nevertheless has unusually intimate knowledge about UTF BOMs, in general, and the
Re: [PATCH v8 4/7] utf8: add function to detect a missing UTF-16/32 BOM
On Sat, Feb 24, 2018 at 11:27 AM,wrote: > If the endianness is not defined in the encoding name, then let's > be strict and require a BOM to avoid any encoding confusion. The > is_missing_required_utf_bom() function returns true if a required BOM > is missing. > > The Unicode standard instructs to assume big-endian if there in no BOM > for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used > in HTML5 recommends to assume little-endian to "deal with deployed > content" [3]. Strictly requiring a BOM seems to be the safest option > for content in Git. > > Signed-off-by: Lars Schneider > --- > diff --git a/utf8.h b/utf8.h > @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type > position, unsigned int wid > +/* > + * If the endianness is not defined in the encoding name, then we > + * require a BOM. The function returns true if a required BOM is missing. > + * > + * The Unicode standard instructs to assume big-endian if there > + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG > + * encoding standard used in HTML5 recommends to assume > + * little-endian to "deal with deployed content" [3]. Perhaps you could tack on to the comment here the final bit of explanation from the commit message which ties these conflicting recommendations together. In particular: Therefore, strictly requiring a BOM seems to be the safest option for content in Git. > + */ > +int is_missing_required_utf_bom(const char *enc, const char *data, size_t > len);
Re: [PATCH] t/known-leaky: add list of known-leaky test scripts
On Wednesday 21 February 2018 02:14 AM, Martin Ågren wrote: > To sum up: I probably won't be looking into Travis-ing such a blacklist > in the near future. > Just thinking out loud, how about having a white-list instead of a black-list and using it to run only those tests in the white list. Something like, t/white_list t t0001 To run -- cd t/ for test in $(cat white_list) do white_list_tests="$white_list_tests $test*" done make SANITIZE=leak $white_list_tests -- Kaartic QUOTE --- “The most valuable person on any team is the person who makes everyone else on the team more valuable, not the person who knows the most.” - Joel Spolsky signature.asc Description: OpenPGP digital signature
Re: [PATCH v8 3/7] utf8: add function to detect prohibited UTF-16/32 BOM
On Sat, Feb 24, 2018 at 11:27 AM,wrote: > Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE > or UTF-32LE a BOM must not be used [1]. The function returns true if > this is the case. > > [1] http://unicode.org/faq/utf_bom.html#bom10 > > Signed-off-by: Lars Schneider > --- > diff --git a/utf8.c b/utf8.c > @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz, > +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) > +{ > + return ( > + (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) && > + (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || > + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) > + ) || ( > + (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) && > + (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || > + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) > + ); > +} Is this interpretation correct? When I read [1], I interpret it as saying that no BOM _of any sort_ should be present when the encoding is declared as one of UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. This code, on the other hand, only checks for BOMs corresponding to the declared size (16 or 32 bits). I suppose the intention of [1] is to detect a mismatch between the declared encoding and how the stream is actually encoded. The check implemented here will fail to detect a mismatch between, say, declared encoding UTF-16BE and actual encoding UTF-32BE.
Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"
On Sun, Feb 25, 2018 at 5:58 AM, brian m. carlsonwrote: >> diff --git a/repository.c b/repository.c >> index 4ffbe9bc94..0d715f4fdb 100644 >> --- a/repository.c >> +++ b/repository.c >> @@ -5,7 +5,7 @@ >> >> /* The main repository */ >> static struct repository the_repo = { >> - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, >> _algos[GIT_HASH_SHA1], 0, 0 >> + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, >> NULL, 0, 0 > > I'm wondering, now that you have the name field for the unknown value, > if that might be a better choice here than NULL. I don't have a strong > preference either way, so whatever you decide here is fine. I did try that first, but for the purpose of catching uninitialized algo use, NULL is better. When I set unknown hash algo here, I think some test failed mysteriously because it used rawsz field (which has value zero), it didn't match some expectation, the code went to the error handling path, which eventually failed with some error message, but it's not obvious that the problem was rawsz being zero and back tracking that took me some time. With NULL hash_algo, any dereferencing fails immediately with a nice stack trace. Another reason to push me towards NULL hash algo is, even if we prefer nice messages over segmentation faults, we can't avoid it completely anyway (empty_tree and empty_blob are still NULL in unknown hash algo and will cause segfaults). Might as well make things consistent and always segfault. -- Duy
Re: [PATCH] revision.c: reduce object database queries
On 2/24/2018 8:34 PM, Derrick Stolee wrote: In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit. Modify the condition to only check has_object_file() if the result would change the parsed bit. When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts. This uses paint_down_to_common() and hits mark_parents_uninteresting(). On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%. I do want to add that the exact same "git status" performance test with the commit-graph feature goes from 0.29 seconds to 0.21 seconds for a relative change of -27.5%. Thanks, -Stolee
[PATCH] revision.c: reduce object database queries
In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit. Modify the condition to only check has_object_file() if the result would change the parsed bit. When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts. This uses paint_down_to_common() and hits mark_parents_uninteresting(). On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%. Signed-off-by: Derrick Stolee--- revision.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/revision.c b/revision.c index 5ce9b93..bc7def5 100644 --- a/revision.c +++ b/revision.c @@ -113,7 +113,8 @@ void mark_parents_uninteresting(struct commit *commit) * it is popped next time around, we won't be trying * to parse it and get an error. */ - if (!has_object_file(>object.oid)) + if (!commit->object.parsed && + !has_object_file(>object.oid)) commit->object.parsed = 1; if (commit->object.flags & UNINTERESTING) -- 2.7.4
Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"
On Sat, Feb 24, 2018 at 10:34:29AM +0700, Nguyễn Thái Ngọc Duy wrote: > This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root > problem, git clone not setting up the_hash_algo, has been fixed in the > previous patch. > > Since this is a dangerous move and could potentially break stuff after > release (and leads to workaround like the reverted commit), the > workaround technically remains, but is hidden behind a new environment > variable GIT_HASH_FIXUP. This should let the users continue to use git > while we fix the problem. This variable can be deleted after one or two > releases. > > Signed-off-by: Nguyễn Thái Ngọc Duy> --- > common-main.c| 10 ++ > repository.c | 2 +- > t/helper/test-dump-split-index.c | 2 ++ > 3 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/common-main.c b/common-main.c > index 6a689007e7..fbfa98c3b8 100644 > --- a/common-main.c > +++ b/common-main.c > @@ -1,6 +1,7 @@ > #include "cache.h" > #include "exec_cmd.h" > #include "attr.h" > +#include "repository.h" > > /* > * Many parts of Git have subprograms communicate via pipe, expect the > @@ -40,5 +41,14 @@ int main(int argc, const char **argv) > > restore_sigpipe_to_default(); > > + /* > + * Temporary recovery measure while hashing code is being > + * refactored. This variable should be gone after everybody > + * has used the_hash_algo in one or two releases and nobody > + * complains anything. > + */ > + if (getenv("GIT_HASH_FIXUP")) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + > return cmd_main(argc, argv); > } > diff --git a/repository.c b/repository.c > index 4ffbe9bc94..0d715f4fdb 100644 > --- a/repository.c > +++ b/repository.c > @@ -5,7 +5,7 @@ > > /* The main repository */ > static struct repository the_repo = { > - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, > _algos[GIT_HASH_SHA1], 0, 0 > + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, NULL, > 0, 0 I'm wondering, now that you have the name field for the unknown value, if that might be a better choice here than NULL. I don't have a strong preference either way, so whatever you decide here is fine. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 3/5] index-pack: check (and optionally set) hash algo based on input file
On Sat, Feb 24, 2018 at 10:34:27AM +0700, Nguyễn Thái Ngọc Duy wrote: > After 454253f059 (builtin/index-pack: improve hash function abstraction > - 2018-02-01), index-pack uses the_hash_algo for hashing. If "git > index-pack" is executed without a repository, we do not know what hash > algorithm to be used and the_hash_algo in theory could be undefined. > > Since there should be some information about the hash algorithm in the > input pack file, we can initialize the correct hash algorithm with that > if the_hash_algo is not yet initialized. This assumes that pack files > with new hash algorithm MUST step up pack version. > > While at there, make sure the hash algorithm requested by the pack file > and configured by the repository (if we're running with a repo) are > consistent. > > Signed-off-by: Nguyễn Thái Ngọc Duy> --- > builtin/index-pack.c | 26 +- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 7e3e1a461c..1144458140 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -326,10 +326,31 @@ static const char *open_pack_file(const char *pack_name) > output_fd = -1; > nothread_data.pack_fd = input_fd; > } > - the_hash_algo->init_fn(_ctx); > return pack_name; > } > > +static void prepare_hash_algo(uint32_t pack_version) > +{ > + const struct git_hash_algo *pack_algo; > + > + switch (pack_version) { > + case 2: > + case 3: > + pack_algo = _algos[GIT_HASH_SHA1]; > + break; > + default: > + die("BUG: how to determine hash algo for new version?"); > + } > + > + if (!the_hash_algo) /* running without repo */ > + the_hash_algo = pack_algo; > + > + if (the_hash_algo != pack_algo) > + die(_("incompatible hash algorithm, " > + "configured for %s but the pack file needs %s"), > + the_hash_algo->name, pack_algo->name); I like this. It's a nice improvement and it should be easy for us to pass additional information into the function when our pack format understands multiple algorithms. I might have done the comparison using the format_id members instead of the pointers themselves, but that's more a personal preference than anything. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 2/5] sha1_file.c: keep a good name for "unknown" hash_algos[UNKNOWN]
On Sat, Feb 24, 2018 at 10:34:26AM +0700, Nguyễn Thái Ngọc Duy wrote: > This is mostly for displaying the hash algorithm name when we report > errors. Printing "unknown" with '%s' is much better than '(null)' in > glibc printf version (and probably could crash if other implementations > do not check for NULL pointer) This is a good change. Using a NULL pointer with %s on the BSDs definitely did cause a crash, last I checked. I looked at the comment I wrote in hash.h, and based on that, it is indeed very likely that someone would accidentally write code this way. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 4/5] diff.c: initialize hash algo when running in --no-index mode
On Sat, Feb 24, 2018 at 09:36:03PM +0700, Duy Nguyen wrote: > On Sat, Feb 24, 2018 at 10:34 AM, Nguyễn Thái Ngọc Duy >wrote: > > @@ -3995,6 +3995,18 @@ static void run_diff(struct diff_filepair *p, struct > > diff_options *o) > > return; > > } > > > > + /* > > +* NEEDSWORK: When running in no-index mode (and no repo is > > +* found, thus no hash algo conifugred), fall back to SHA-1 > > +* hashing (which is used by diff_fill_oid_info below) to > > +* avoid regression in diff output. > > +* > > +* In future, perhaps we can allow the user to specify their > > +* hash algorithm from command line in this mode. > > +*/ > > + if (o->flags.no_index && !the_hash_algo) > > + the_hash_algo = _algos[GIT_HASH_SHA1]; > > Brian, are we supposed to use the_hash_algo this way (i.e. as a > writable var)? Or should I stick to something like > > repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > which allows us to notify other parts inside struct repository about > the hash algorithm change, if we ever need to? I would definitely recommend using the function. As you pointed out, it makes our code future-proof against needing to more work when setting the value. > If the_hash_algo is supposed to be read-only, maybe I should convert > that macro to an inline function to prevent people from accidentally > reassigning it? You could if you want to, although I don't really see a need to, since we can just catch it in review. If you wanted to, I'd make it an inline function for performance reasons. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH] Call timegm and timelocal with 4-digit year
On Fri, Feb 23 2018, Bernhard M. Wiedemann jotted: > amazingly timegm(gmtime(0)) is only 0 before 2020 > because perl's timegm deviates from GNU timegm(3) in how it handles years. > > man Time::Local says > > Whenever possible, use an absolute four digit year instead. > > with a detailed explanation about ambiguity of 2-digit years above that. > > Even though this ambiguity is error-prone with >50% of users getting it > wrong, it has been like this for 20+ years, so we just use 4-digit years > everywhere to be on the safe side. > > We add some extra logic to cvsimport because it allows 2-digit year > input and interpreting an 18 as 1918 can be avoided easily and safely. > > Signed-off-by: Bernhard M. Wiedemann> --- > contrib/examples/git-svnimport.perl | 2 +- > git-cvsimport.perl | 4 +++- > perl/Git.pm | 4 +++- > perl/Git/SVN.pm | 2 +- > 4 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/contrib/examples/git-svnimport.perl > b/contrib/examples/git-svnimport.perl > index c414f0d9c..75a43e23b 100755 > --- a/contrib/examples/git-svnimport.perl > +++ b/contrib/examples/git-svnimport.perl > @@ -238,7 +238,7 @@ sub pdate($) { > my($d) = @_; > $d =~ m#(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)# > or die "Unparseable date: $d\n"; > - my $y=$1; $y-=1900 if $y>1900; > + my $y=$1; $y+=1900 if $y<1000; > return timegm($6||0,$5,$4,$3,$2-1,$y); I wonder if this whole thing was just cargo-culted to begin with. We need to match (\d\d\d\d) here, so did SVN's format ever have years like "0098" (just "98" wouldn't match), so I suspect the whole munging could be dropped, but this change seems harmless. Just something that jumped out at me reviewing this. > diff --git a/git-cvsimport.perl b/git-cvsimport.perl > index 2d8df8317..b31613cb8 100755 > --- a/git-cvsimport.perl > +++ b/git-cvsimport.perl > @@ -601,7 +601,9 @@ sub pdate($) { > my ($d) = @_; > m#(\d{2,4})/(\d\d)/(\d\d)\s(\d\d):(\d\d)(?::(\d\d))?# > or die "Unparseable date: $d\n"; > - my $y=$1; $y-=1900 if $y>1900; > + my $y=$1; > + $y+=100 if $y<70; > + $y+=1900 if $y<1000; > return timegm($6||0,$5,$4,$3,$2-1,$y); > } My Time::Local 1.2300 on perl 5.024001 currently interprets "69" here as 1969, but after this it'll be 2069. Now I doubt anyone's going to be importing CVS history of projects a little over 20 years before CVS was created in 1990 (although I suppose old imports...), but just wanted to note it since it seems odd for code that's auto-interpreting double digit years for the purposes of importing existing data to end up in an edge case where it returns dates more than 50 years in the future. > diff --git a/perl/Git.pm b/perl/Git.pm > index ffa09ace9..df62518c7 100644 > --- a/perl/Git.pm > +++ b/perl/Git.pm > @@ -534,7 +534,9 @@ If TIME is not supplied, the current local time is used. > sub get_tz_offset { > # some systems don't handle or mishandle %z, so be creative. > my $t = shift || time; > - my $gm = timegm(localtime($t)); > + my @t = localtime($t); > + $t[5] += 1900; > + my $gm = timegm(@t); > Nice. Just using the 4-digit date is always more correct and won't ever be buggy. > my $sign = qw( + + - )[ $gm <=> $t ]; > return sprintf("%s%02d%02d", $sign, (gmtime(abs($t - $gm)))[2,1]); > } > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm > index bc4eed3d7..991a5885e 100644 > --- a/perl/Git/SVN.pm > +++ b/perl/Git/SVN.pm > @@ -1405,7 +1405,7 @@ sub parse_svn_date { > $ENV{TZ} = 'UTC'; > > my $epoch_in_UTC = > - Time::Local::timelocal($S, $M, $H, $d, $m - 1, $Y - 1900); > + Time::Local::timelocal($S, $M, $H, $d, $m - 1, $Y); Ditto. Nicely caught. > > # Determine our local timezone (including DST) at the > # time of $epoch_in_UTC. $Git::SVN::Log::TZ stored the Anyway, this all looks good to me as-is. That CVS edge case is obscure and not worth focusing on, and the SVN one could be fixed up in another commit if anyone cared. I just spent a bit more time than I should have wondering what this timegm() edge case was about and whether it might impact other (unrelated to git) code I had.
Re: [PATCH v5 00/17] document & test fetch pruning & add fetch.pruneTags
On Fri, Feb 23 2018, Junio C. Hamano jotted: > Ævar Arnfjörð Bjarmasonwrites: > >>> Let's see how others find it useful and/or if the changed code gets >>> in the way of others (I am not absolutely sure if the changes are >>> free of regression to existing users who do not use the new >>> feature). >> >> I think if you're on the fence about merging it down (and others don't >> chime in saying the want it / like it) it makes sense to merge down >> 1-14/17 and we could discard 15-17/17 for now for a later re-submission >> and discussion once the earlier part of the series lands in master. > > Or we merge everything and let people discover glitches and > complain. Reverting the last pieces can wait until then ;-). Thanks, that obviously works for me.
[PATCH] git-gui: search for all current SSH key types
OpenSSH has supported Ed25519 keys since version 6.4 (2014-01-30), and ECDSA keys since version 5.7 (2011-01-24). git-gui fails to find these key types in its Help/Show SSH Key dialog. Teach git-gui to show Ed25519 and ECDSA keys as well. This was originally reported in https://github.com/git-for-windows/git/issues/1487 and subseqently in https://public-inbox.org/git/f65780f29e48994380e2bce87c6f071101146...@deerlm99ex2msx.ww931.my-it-solutions.net/ Signed-off-by: Beat Bolli--- Cc: Alexander Gavrilov Cc: Pat Thoyts git-gui/lib/sshkey.tcl | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/git-gui/lib/sshkey.tcl b/git-gui/lib/sshkey.tcl index aa6457bbb..589ff8f78 100644 --- a/git-gui/lib/sshkey.tcl +++ b/git-gui/lib/sshkey.tcl @@ -2,7 +2,10 @@ # Copyright (C) 2006, 2007 Shawn Pearce proc find_ssh_key {} { - foreach name {~/.ssh/id_dsa.pub ~/.ssh/id_rsa.pub ~/.ssh/identity.pub} { + foreach name { + ~/.ssh/id_dsa.pub ~/.ssh/id_ecdsa.pub ~/.ssh/id_ed25519.pub + ~/.ssh/id_rsa.pub ~/.ssh/identity.pub + } { if {[file exists $name]} { set fh[open $name r] set cont [read $fh] -- 2.15.0.rc1.299.gda03b47c3
[PATCH v8 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding'
From: Lars SchneiderUTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1]. Add 'core.checkRoundtripEncoding', which contains a comma separated list of encodings, to define for what encodings Git should check the conversion round trip if they are used in the 'working-tree-encoding' attribute. Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'. [1] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode Signed-off-by: Lars Schneider --- Documentation/config.txt | 6 Documentation/gitattributes.txt | 8 + config.c | 5 +++ convert.c| 74 convert.h| 1 + environment.c| 1 + t/t0028-working-tree-encoding.sh | 41 ++ 7 files changed, 136 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e25b2c92b..d7a56054a5 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -530,6 +530,12 @@ core.autocrlf:: This variable can be set to 'input', in which case no output conversion is performed. +core.checkRoundtripEncoding:: + A comma separated list of encodings that Git performs UTF-8 round + trip checks on if they are used in an `working-tree-encoding` + attribute (see linkgit:gitattributes[5]). The default value is + `SHIFT-JIS`. + core.symlinks:: If false, symbolic links are checked out as small plain files that contain the link text. linkgit:git-update-index[1] and diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index eddeaee1f7..11315054f4 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -310,6 +310,14 @@ number of pitfalls: internal contents as UTF-8 and try to convert it to UTF-16 on checkout. That operation will fail and cause an error. +- Reencoding content to non-UTF encodings can cause errors as the + conversion might not be UTF-8 round trip safe. If you suspect your + encoding to not be round trip safe, then add it to + `core.checkRoundtripEncoding` to make Git check the round trip + encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character + set) is known to have round trip issues with UTF-8 and is checked by + default. + - Reencoding content requires resources that might slow down certain Git operations (e.g 'git checkout' or 'git add'). diff --git a/config.c b/config.c index 1f003fbb90..d0ada9fcd4 100644 --- a/config.c +++ b/config.c @@ -1172,6 +1172,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.checkroundtripencoding")) { + check_roundtrip_encoding = xstrdup(value); + return 0; + } + if (!strcmp(var, "core.notesref")) { notes_ref_name = xstrdup(value); return 0; diff --git a/convert.c b/convert.c index c4e2fd5fa5..398cd9cf7b 100644 --- a/convert.c +++ b/convert.c @@ -289,6 +289,39 @@ static void trace_encoding(const char *context, const char *path, strbuf_release(); } +static int check_roundtrip(const char* enc_name) +{ + /* +* check_roundtrip_encoding contains a string of space and/or +* comma separated encodings (eg. "UTF-16, ASCII, CP1125"). +* Search for the given encoding in that string. +*/ + const char *found = strcasestr(check_roundtrip_encoding, enc_name); + const char *next = found + strlen(enc_name); + int len = strlen(check_roundtrip_encoding); + return (found && ( + /* +* check that the found encoding is at the +* beginning of check_roundtrip_encoding or +* that it is prefixed with a space or comma +*/ + found == check_roundtrip_encoding || ( + found > check_roundtrip_encoding && + (*(found-1) == ' ' || *(found-1) == ',') + ) + ) && ( + /* +* check that the found encoding is at the +* end of check_roundtrip_encoding or +* that it is suffixed with a space or comma +*/ + next == check_roundtrip_encoding + len || ( + next < check_roundtrip_encoding + len && + (*next == ' ' || *next == ',') +
[PATCH v8 4/7] utf8: add function to detect a missing UTF-16/32 BOM
From: Lars SchneiderIf the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if there in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used in HTML5 recommends to assume little-endian to "deal with deployed content" [3]. Strictly requiring a BOM seems to be the safest option for content in Git. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#gen6 [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf Section 3.10, D98, page 132 [3] https://encoding.spec.whatwg.org/#utf-16le Signed-off-by: Lars Schneider --- utf8.c | 13 + utf8.h | 16 2 files changed, 29 insertions(+) diff --git a/utf8.c b/utf8.c index 914881cd1f..5113d26e56 100644 --- a/utf8.c +++ b/utf8.c @@ -562,6 +562,19 @@ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) ); } +int is_missing_required_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + !strcmp(enc, "UTF-16") && + !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || +has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + !strcmp(enc, "UTF-32") && + !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || +has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 4711429af9..62f86fba64 100644 --- a/utf8.h +++ b/utf8.h @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid */ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); +/* + * If the endianness is not defined in the encoding name, then we + * require a BOM. The function returns true if a required BOM is missing. + * + * The Unicode standard instructs to assume big-endian if there + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG + * encoding standard used in HTML5 recommends to assume + * little-endian to "deal with deployed content" [3]. + * + * [1] http://unicode.org/faq/utf_bom.html#gen6 + * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf + * Section 3.10, D98, page 132 + * [3] https://encoding.spec.whatwg.org/#utf-16le + */ +int is_missing_required_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.1
[PATCH v8 6/7] convert: add tracing for 'working-tree-encoding' attribute
From: Lars SchneiderAdd the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider --- convert.c| 25 + t/t0028-working-tree-encoding.sh | 2 ++ 2 files changed, 27 insertions(+) diff --git a/convert.c b/convert.c index d20c341b6d..c4e2fd5fa5 100644 --- a/convert.c +++ b/convert.c @@ -266,6 +266,29 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } +static void trace_encoding(const char *context, const char *path, + const char *encoding, const char *buf, size_t len) +{ + static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING); + struct strbuf trace = STRBUF_INIT; + int i; + + strbuf_addf(, "%s (%s, considered %s):\n", context, path, encoding); + for (i = 0; i < len && buf; ++i) { + strbuf_addf( + ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c", + i, + (unsigned char) buf[i], + (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '), + ((i+1) % 8 && (i+1) < len ? ' ' : '\n') + ); + } + strbuf_addchars(, '\n', 1); + + trace_strbuf(, ); + strbuf_release(); +} + static struct encoding { const char *name; struct encoding *next; @@ -325,6 +348,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, error(error_msg, path, enc->name); } + trace_encoding("source", path, enc->name, src, src_len); dst = reencode_string_len(src, src_len, default_encoding, enc->name, _len); if (!dst) { @@ -340,6 +364,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, else error(msg, path, enc->name, default_encoding); } + trace_encoding("destination", path, default_encoding, dst, dst_len); strbuf_attach(buf, dst, dst_len, dst_len + 1); return 1; diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index a55bbcb6d3..869b2d 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -4,6 +4,8 @@ test_description='working-tree-encoding conversion via gitattributes' . ./test-lib.sh +GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING + test_expect_success 'setup test repo' ' git config core.eol lf && -- 2.16.1
[PATCH v8 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
From: Lars SchneiderSince 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider --- strbuf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 1df674e919..55b7daeb35 100644 --- a/strbuf.c +++ b/strbuf.c @@ -781,7 +781,6 @@ char *xstrdup_tolower(const char *string) result = xmallocz(len); for (i = 0; i < len; i++) result[i] = tolower(string[i]); - result[i] = '\0'; return result; } -- 2.16.1
[PATCH v8 5/7] convert: add 'working-tree-encoding' attribute
From: Lars SchneiderGit recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front ends do not visualize the content. Add an attribute to tell Git what encoding the user has defined for a given file. If the content is added to the index, then Git converts the content to a canonical UTF-8 representation. On checkout Git will reverse the conversion. Signed-off-by: Lars Schneider --- Documentation/gitattributes.txt | 78 ++ convert.c| 157 ++- convert.h| 1 + sha1_file.c | 2 +- t/t0028-working-tree-encoding.sh | 226 +++ 5 files changed, 462 insertions(+), 2 deletions(-) create mode 100755 t/t0028-working-tree-encoding.sh diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81a..eddeaee1f7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -272,6 +272,84 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`working-tree-encoding` +^^^ + +Git recognizes files encoded in ASCII or one of its supersets (e.g. +UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other +encodings (e.g. UTF-16) are interpreted as binary and consequently +built-in Git text processing tools (e.g. 'git diff') as well as most Git +web front ends do not visualize the contents of these files by default. + +In these cases you can tell Git the encoding of a file in the working +directory with the `working-tree-encoding` attribute. If a file with this +attribute is added to Git, then Git reencodes the content from the +specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded +content in its internal data structure (called "the index"). On checkout +the content is reencoded back to the specified encoding. + +Please note that using the `working-tree-encoding` attribute may have a +number of pitfalls: + +- Alternative Git implementations (e.g. JGit or libgit2) and older Git + versions (as of March 2018) do not support the `working-tree-encoding` + attribute. If you decide to use the `working-tree-encoding` attribute + in your repository, then it is strongly recommended to ensure that all + clients working with the repository support it. + + If you declare `*.proj` files as UTF-16 and you add `foo.proj` with an + `working-tree-encoding` enabled Git client, then `foo.proj` will be + stored as UTF-8 internally. A client without `working-tree-encoding` + support will checkout `foo.proj` as UTF-8 encoded file. This will + typically cause trouble for the users of this file. + + If a Git client, that does not support the `working-tree-encoding` + attribute, adds a new file `bar.proj`, then `bar.proj` will be + stored "as-is" internally (in this example probably as UTF-16). + A client with `working-tree-encoding` support will interpret the + internal contents as UTF-8 and try to convert it to UTF-16 on checkout. + That operation will fail and cause an error. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +Use the `working-tree-encoding` attribute only if you cannot store a file +in UTF-8 encoding and if you want Git to be able to process the content +as text. + +As an example, use the following attributes if your '*.proj' files are +UTF-16 encoded with byte order mark (BOM) and you want Git to perform +automatic line ending conversion based on your platform. + + +*.proj text working-tree-encoding=UTF-16 + + +Use the following attributes if your '*.proj' files are UTF-16 little +endian encoded without BOM and you want Git to use Windows line endings +in the working directory. Please note, it is highly recommended to +explicitly define the line endings with `eol` if the `working-tree-encoding` +attribute is used to avoid ambiguity. + + +*.proj text working-tree-encoding=UTF-16LE eol=CRLF + + +You can get a list of all available encodings on your platform with the +following command: + + +iconv --list + + +If you do not know the encoding of a file, then you can use the `file` +command to guess the encoding: + + +file foo.proj + + + `ident` ^^^ diff --git a/convert.c b/convert.c index b976eb968c..d20c341b6d 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when
[PATCH v8 3/7] utf8: add function to detect prohibited UTF-16/32 BOM
From: Lars SchneiderWhenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars Schneider --- utf8.c | 24 utf8.h | 9 + 2 files changed, 33 insertions(+) diff --git a/utf8.c b/utf8.c index 2c27ce0137..914881cd1f 100644 --- a/utf8.c +++ b/utf8.c @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz, } #endif +static int has_bom_prefix(const char *data, size_t len, + const char *bom, size_t bom_len) +{ + return (len >= bom_len) && !memcmp(data, bom, bom_len); +} + +static const char utf16_be_bom[] = {0xFE, 0xFF}; +static const char utf16_le_bom[] = {0xFF, 0xFE}; +static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF}; +static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00}; + +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) && + (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) && + (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 6bbcf31a83..4711429af9 100644 --- a/utf8.h +++ b/utf8.h @@ -70,4 +70,13 @@ typedef enum { void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int width, const char *s); +/* + * Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE + * or UTF-32LE a BOM must not be used [1]. The function returns true if + * this is the case. + * + * [1] http://unicode.org/faq/utf_bom.html#bom10 + */ +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.1
[PATCH v8 2/7] strbuf: add xstrdup_toupper()
From: Lars SchneiderCreate a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a/strbuf.c b/strbuf.c index 55b7daeb35..b635f0bdc4 100644 --- a/strbuf.c +++ b/strbuf.c @@ -784,6 +784,18 @@ char *xstrdup_tolower(const char *string) return result; } +char *xstrdup_toupper(const char *string) +{ + char *result; + size_t len, i; + + len = strlen(string); + result = xmallocz(len); + for (i = 0; i < len; i++) + result[i] = toupper(string[i]); + return result; +} + char *xstrvfmt(const char *fmt, va_list ap) { struct strbuf buf = STRBUF_INIT; diff --git a/strbuf.h b/strbuf.h index 14c8c10d66..df7ced53ed 100644 --- a/strbuf.h +++ b/strbuf.h @@ -607,6 +607,7 @@ __attribute__((format (printf,2,3))) extern int fprintf_ln(FILE *fp, const char *fmt, ...); char *xstrdup_tolower(const char *); +char *xstrdup_toupper(const char *); /** * Create a newly allocated string using printf format. You can do this easily -- 2.16.1
[PATCH v8 0/7] convert: add support for different encodings
From: Lars SchneiderHi, Patches 1-4, 6 are preparation and helper functions. Patch 5,7 are the actual change. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already in master. Changes since v7: * make it clearer in the documentation that Git stores content "as-is" by default. Content is only stored in UTF-8 if w-t-e is used (Junio) * add test case for $GIT_DIR/info/attributes support (Junio) Thanks, Lars RFC: https://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.com/ v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/ v2: https://public-inbox.org/git/2017122915.39680-1-lars.schnei...@autodesk.com/ v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schnei...@autodesk.com/ v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schnei...@autodesk.com/ v5: https://public-inbox.org/git/20180129201855.9182-1-tbo...@web.de/ v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schnei...@autodesk.com/ v7: https://public-inbox.org/git/20180215152711.158-1-lars.schnei...@autodesk.com/ Base Ref: Web-Diff: https://github.com/larsxschneider/git/commit/2758a2da29 Checkout: git fetch https://github.com/larsxschneider/git encoding-v8 && git checkout 2758a2da29 ### Interdiff (v7..v8): diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 10cb37795d..11315054f4 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -275,11 +275,11 @@ few exceptions. Even though... `working-tree-encoding` ^^^ -Git recognizes files encoded with ASCII or one of its supersets (e.g. -UTF-8 or ISO-8859-1) as text files. All other encodings are usually -interpreted as binary and consequently built-in Git text processing -tools (e.g. 'git diff') as well as most Git web front ends do not -visualize the content. +Git recognizes files encoded in ASCII or one of its supersets (e.g. +UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other +encodings (e.g. UTF-16) are interpreted as binary and consequently +built-in Git text processing tools (e.g. 'git diff') as well as most Git +web front ends do not visualize the contents of these files by default. In these cases you can tell Git the encoding of a file in the working directory with the `working-tree-encoding` attribute. If a file with this @@ -291,12 +291,24 @@ the content is reencoded back to the specified encoding. Please note that using the `working-tree-encoding` attribute may have a number of pitfalls: -- Third party Git implementations that do not support the - `working-tree-encoding` attribute will checkout the respective files - UTF-8 encoded and not in the expected encoding. Consequently, these - files will appear different which typically causes trouble. This is - in particular the case for older Git versions and alternative Git - implementations such as JGit or libgit2 (as of February 2018). +- Alternative Git implementations (e.g. JGit or libgit2) and older Git + versions (as of March 2018) do not support the `working-tree-encoding` + attribute. If you decide to use the `working-tree-encoding` attribute + in your repository, then it is strongly recommended to ensure that all + clients working with the repository support it. + + If you declare `*.proj` files as UTF-16 and you add `foo.proj` with an + `working-tree-encoding` enabled Git client, then `foo.proj` will be + stored as UTF-8 internally. A client without `working-tree-encoding` + support will checkout `foo.proj` as UTF-8 encoded file. This will + typically cause trouble for the users of this file. + + If a Git client, that does not support the `working-tree-encoding` + attribute, adds a new file `bar.proj`, then `bar.proj` will be + stored "as-is" internally (in this example probably as UTF-16). + A client with `working-tree-encoding` support will interpret the + internal contents as UTF-8 and try to convert it to UTF-16 on checkout. + That operation will fail and cause an error. - Reencoding content to non-UTF encodings can cause errors as the conversion might not be UTF-8 round trip safe. If you suspect your diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index e4717402a5..e34c21eb29 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -13,8 +13,11 @@ test_expect_success 'setup test repo' ' echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes && printf "$text" >test.utf8.raw && printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw && + printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw && cp test.utf16.raw test.utf16 && + cp test.utf32.raw test.utf32 && + # Add only UTF-16 file, we will add the UTF-32 file later git add
Re: [PATCH v7 0/7] convert: add support for different encodings
> On 23 Feb 2018, at 21:11, Junio C Hamanowrote: > > Junio C Hamano writes: > >> Lars Schneider writes: >> >>> I still think it would be nice to see diffs for arbitrary encodings. >>> Would it be an option to read the `encoding` attribute and use it in >>> `git diff`? >> >> Reusing that gitk-only thing and suddenly start doing so would break >> gitk users, no? The tool expects the diff to come out encoded in >> the encoding that is specified by that attribute (which is learned >> from get_path_encoding helper) and does its thing. >> >> I guess that gitk uses diff-tree plumbing and you won't be applying >> this change to the plumbing, perhaps? If so, it might not be too >> bad, but those who decided to postprocess "git diff" output (instead >> of "git diff-tree" output) mimicking how gitk does it by thinking >> that is the safe and sane thing to do will be broken by such a >> change. You could do "use the encoding only when a command line >> option says so", but then people will add a configuration variable >> to turn it always on and these existing scripts will be broken. >> >> I do not personally have much sympathy for the last case (i.e. those >> who scripted around 'git diff' instead of 'git diff-tree' to get >> broken), so making the new feature only work with the Porcelain "git >> diff" might be an option. I'll need a bit more time to formulate >> the rest of my thought ;-) > > So we are introducing in this series a way to say in what encoding > the things should be placed in the working tree files (i.e. the > w-t-e attribute attached to the paths). Currently there is no > mechanism to say what encoding the in-repo contents are and UTF-8 is > assumed when conversion from/to w-t-e is required, but there is no > fundamental reason why it shouldn't be customizable (if anything, as > a piece of fact on the in-repo data, in-repo-encoding is *more* > appropriate to be an attribute than w-t-e that can merely be project > preference at best, as I mentioned earlier in this thread). Correct. > We always use the in-repo contents when generating 'diff'. I think > by "attribute to be used in diff", what you are reallying after is > to convert the in-repo contents to that encoding _BEFORE_ running > 'diff' on it. E.g. in-repo UTF-16 that can have NUL bytes all over > the place will not diff well with the xdiff machinery, but if you > first convert it to UTF-8 and have xdiff work on it, you can get > reasonable result out of it. It is unclear what encoding you want > your final diff output in (it is equally valid in such a set-up to > desire your patch output in UTF-16 or UTF-8), but assuming that you > want UTF-8 in your patch output, perhaps we do not have to break > gitk users by hijacking the 'encoding' attribute. Instead what you > want is a single bit that says between in-repo or working tree which > representation should be given to the xdiff machinery. I fear that we could confuse users with an additional knob/bit that defines what we diff against. Git always diff'ed against in-repo content and I feel it should stay that way. However, I agree with your earlier emails that "working-tree-encoding" is just one half of the feature. I also think it would be nice to be able to define the "in-repo-encoding" as well. Then we could define something like that: *.foo text in-repo-encoding=UTF-16LE This tells Git that the file is stored as UTF-16LE. This would help Git generating a diff via UTF-8 conversion. I feel that the final patch should be in UTF-16LE again. Maybe over time we could then deprecate the "encoding" attribute as the "in-repo-encoding" attribute serves a similar purpose (maybe gitk can switch to it). In that case we could also do things like that: *.bar text working-tree-encoding=SHIFT-JIS in-repo-encoding=UTF-16LE SHIFT-JIS encoded files would be reencoded to UTF-16LE on checkin. On checkout the opposite would happen. This way we would lift the "UTF-8 is the only in-repo encoding" limitation of the current w-t-e implementation. Does this sound sensible to you? That being said, I think "in-repo-encoding" would deserve an own series. - Lars
Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)
I notice that there are a few global state (or configuration rather) left after this: packed_git_window_size, packed_git_limit and delta_base_cache_limit. These can be changed by $GIT_DIR/config, but since it's still global, a submodule repository will use the same settings of its super repository. If $SUBMODULE/config changes any of these, they are ignored. The natural thing to do is move these to raw_object_store too (and repo_submodule_init needs to check submodule's config file). But one may argue these should be per-process instead of per-repo though. I don't know. But I thought I should mention this. -- Duy
Re: [PATCH v2 4/5] diff.c: initialize hash algo when running in --no-index mode
On Sat, Feb 24, 2018 at 10:34 AM, Nguyễn Thái Ngọc Duywrote: > @@ -3995,6 +3995,18 @@ static void run_diff(struct diff_filepair *p, struct > diff_options *o) > return; > } > > + /* > +* NEEDSWORK: When running in no-index mode (and no repo is > +* found, thus no hash algo conifugred), fall back to SHA-1 > +* hashing (which is used by diff_fill_oid_info below) to > +* avoid regression in diff output. > +* > +* In future, perhaps we can allow the user to specify their > +* hash algorithm from command line in this mode. > +*/ > + if (o->flags.no_index && !the_hash_algo) > + the_hash_algo = _algos[GIT_HASH_SHA1]; Brian, are we supposed to use the_hash_algo this way (i.e. as a writable var)? Or should I stick to something like repo_set_hash_algo(the_repository, GIT_HASH_SHA1); which allows us to notify other parts inside struct repository about the hash algorithm change, if we ever need to? If the_hash_algo is supposed to be read-only, maybe I should convert that macro to an inline function to prevent people from accidentally reassigning it? > + > diff_fill_oid_info(one); > diff_fill_oid_info(two); > > -- > 2.16.1.435.g8f24da2e1a > -- Duy
Re: [GSoC][PATCH v2] ref-filter: Make "--contains " less chatty if is invalid
Hello, Your proposed solution makes a lot more sense. I have actually considered a solution similar to this (the third solution from [1]), but found it more complicated. I did not account for the fact that once a callback is called, the user is already aware of the available options and the user only supplied an invalid argument value. I also have to make sure that all parsers (all callbacks and standard ones, for integer, filename, etc.) are already printing errors appropriately. Otherwise, some commands may fail and the user will not be aware of it because nothing will be shown (no usage is shown and no errors either). I will be implementing this solution and come back with another patch. Thank you for your review. I really appreciate it! [1] https://public-inbox.org/git/20160118215433.gb24...@sigill.intra.pe ff.net/ Best regards, Paul Ungureanu
[PATCH v5 2/2] diff: add --compact-summary
Certain information is currently shown with --summary, but when used in combination with --stat it's a bit hard to read since info of the same file is in two places (--stat and --summary). On top of that, commits that add or remove files double the number of display lines, which could be a lot if you add or remove a lot of files. --compact-summary embeds most of --summary back in --stat in the little space between the file name part and the graph line, e.g. with commit 0433d533f1: Documentation/merge-config.txt | 4 + builtin/merge.c| 2 + ...-pull-verify-signatures.sh (new +x) | 81 ++ t/t7612-merge-verify-signatures.sh | 45 4 files changed, 132 insertions(+) It helps both condensing information and saving some text space. What's new in diffstat is: - A new 0644 file is shown as (new) - A new 0755 file is shown as (new +x) - A new symlink is shown as (new +l) - A deleted file is shown as (gone) - A mode change adding executable bit is shown as (mode +x) - A mode change removing it is shown as (mode -x) Note that --compact-summary does not contain all the information --summary provides. Rewrite percentage is not shown but it could be added later, like R50% or C20%. Signed-off-by: Nguyễn Thái Ngọc Duy--- Documentation/diff-options.txt| 8 diff.c| 37 +++ diff.h| 1 + t/t4013-diff-various.sh | 5 +++ ...ty_--root_--stat_--compact-summary_initial | 12 ++ ...-R_--root_--stat_--compact-summary_initial | 12 ++ ...tree_--stat_--compact-summary_initial_mode | 4 ++ ...e_-R_--stat_--compact-summary_initial_mode | 4 ++ 8 files changed, 83 insertions(+) create mode 100644 t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--stat_--compact-summary_initial_mode create mode 100644 t/t4013/diff.diff-tree_-R_--stat_--compact-summary_initial_mode diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index c330c01ff0..e3a44f03cd 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -128,6 +128,14 @@ have to use `--diff-algorithm=default` option. These parameters can also be set individually with `--stat-width=`, `--stat-name-width=` and `--stat-count=`. +--compact-summary:: + Output a condensed summary of extended header information such + as file creations or deletions ("new" or "gone", optionally "+l" + if it's a symlink) and mode changes ("+x" or "-x" for adding + or removing executable bit respectively) in diffstat. The + information is put betwen the filename part and the graph + part. Implies `--stat`. + --numstat:: Similar to `--stat`, but shows number of added and deleted lines in decimal notation and pathname without diff --git a/diff.c b/diff.c index e3f72de27d..62e413a80f 100644 --- a/diff.c +++ b/diff.c @@ -2129,6 +2129,7 @@ struct diffstat_t { char *from_name; char *name; char *print_name; + const char *comments; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -2205,6 +2206,9 @@ static void fill_print_name(struct diffstat_file *file) else quote_c_style(file->name, , NULL, 0); + if (file->comments) + strbuf_addf(, " (%s)", file->comments); + file->print_name = strbuf_detach(, NULL); } @@ -3239,6 +3243,32 @@ static void builtin_diff(const char *name_a, return; } +static char *get_compact_summary(const struct diff_filepair *p, int is_renamed) +{ + if (!is_renamed) { + if (p->status == DIFF_STATUS_ADDED) { + if (S_ISLNK(p->two->mode)) + return "new +l"; + else if ((p->two->mode & 0777) == 0755) + return "new +x"; + else + return "new"; + } else if (p->status == DIFF_STATUS_DELETED) + return "gone"; + } + if (S_ISLNK(p->one->mode) && !S_ISLNK(p->two->mode)) + return "mode -l"; + else if (!S_ISLNK(p->one->mode) && S_ISLNK(p->two->mode)) + return "mode +l"; + else if ((p->one->mode & 0777) == 0644 && +(p->two->mode & 0777) == 0755) + return "mode +x"; + else if ((p->one->mode & 0777) == 0755 && +(p->two->mode & 0777) == 0644) + return "mode -x"; + return NULL; +} + static void builtin_diffstat(const char *name_a, const char *name_b, struct
[PATCH v5 0/2] diff: add --compact-summary (aka nd/diff-stat-with-summary)
... and v5 fixes the commit message of 2/2 where in v4 it still mentions --stat-with-summary instead of --compact-summary. Sorry. Nguyễn Thái Ngọc Duy (2): diff.c: refactor pprint_rename() to use strbuf diff: add --compact-summary Documentation/diff-options.txt| 8 ++ diff.c| 96 --- diff.h| 1 + t/t4013-diff-various.sh | 5 + ...ty_--root_--stat_--compact-summary_initial | 12 +++ ...-R_--root_--stat_--compact-summary_initial | 12 +++ ...tree_--stat_--compact-summary_initial_mode | 4 + ...e_-R_--stat_--compact-summary_initial_mode | 4 + 8 files changed, 109 insertions(+), 33 deletions(-) create mode 100644 t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--stat_--compact-summary_initial_mode create mode 100644 t/t4013/diff.diff-tree_-R_--stat_--compact-summary_initial_mode -- 2.16.1.435.g8f24da2e1a
[PATCH v5 1/2] diff.c: refactor pprint_rename() to use strbuf
Instead of passing char* around, let function handle strbuf directly. All callers already use strbuf internally. This helps kill the "not free" exception in free_diffstat_info(). I don't think this code is so critical that we need to avoid some free() calls. The other benefit comes in the next patch, where we append something in pname before returning from fill_print_name(). With strbuf, it's very simple. With "char *" we may have to resort to explicit reallocation and stuff. Signed-off-by: Nguyễn Thái Ngọc Duy--- diff.c | 59 ++ 1 file changed, 26 insertions(+), 33 deletions(-) diff --git a/diff.c b/diff.c index 21c3838b25..e3f72de27d 100644 --- a/diff.c +++ b/diff.c @@ -2045,11 +2045,10 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) } } -static char *pprint_rename(const char *a, const char *b) +static void pprint_rename(struct strbuf *name, const char *a, const char *b) { const char *old = a; const char *new = b; - struct strbuf name = STRBUF_INIT; int pfx_length, sfx_length; int pfx_adjust_for_slash; int len_a = strlen(a); @@ -2059,10 +2058,10 @@ static char *pprint_rename(const char *a, const char *b) int qlen_b = quote_c_style(b, NULL, NULL, 0); if (qlen_a || qlen_b) { - quote_c_style(a, , NULL, 0); - strbuf_addstr(, " => "); - quote_c_style(b, , NULL, 0); - return strbuf_detach(, NULL); + quote_c_style(a, name, NULL, 0); + strbuf_addstr(name, " => "); + quote_c_style(b, name, NULL, 0); + return; } /* Find common prefix */ @@ -2109,19 +2108,18 @@ static char *pprint_rename(const char *a, const char *b) if (b_midlen < 0) b_midlen = 0; - strbuf_grow(, pfx_length + a_midlen + b_midlen + sfx_length + 7); + strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 7); if (pfx_length + sfx_length) { - strbuf_add(, a, pfx_length); - strbuf_addch(, '{'); + strbuf_add(name, a, pfx_length); + strbuf_addch(name, '{'); } - strbuf_add(, a + pfx_length, a_midlen); - strbuf_addstr(, " => "); - strbuf_add(, b + pfx_length, b_midlen); + strbuf_add(name, a + pfx_length, a_midlen); + strbuf_addstr(name, " => "); + strbuf_add(name, b + pfx_length, b_midlen); if (pfx_length + sfx_length) { - strbuf_addch(, '}'); - strbuf_add(, a + len_a - sfx_length, sfx_length); + strbuf_addch(name, '}'); + strbuf_add(name, a + len_a - sfx_length, sfx_length); } - return strbuf_detach(, NULL); } struct diffstat_t { @@ -2197,23 +2195,17 @@ static void show_graph(struct strbuf *out, char ch, int cnt, static void fill_print_name(struct diffstat_file *file) { - char *pname; + struct strbuf pname = STRBUF_INIT; if (file->print_name) return; - if (!file->is_renamed) { - struct strbuf buf = STRBUF_INIT; - if (quote_c_style(file->name, , NULL, 0)) { - pname = strbuf_detach(, NULL); - } else { - pname = file->name; - strbuf_release(); - } - } else { - pname = pprint_rename(file->from_name, file->name); - } - file->print_name = pname; + if (file->is_renamed) + pprint_rename(, file->from_name, file->name); + else + quote_c_style(file->name, , NULL, 0); + + file->print_name = strbuf_detach(, NULL); } static void print_stat_summary_inserts_deletes(struct diff_options *options, @@ -2797,8 +2789,7 @@ static void free_diffstat_info(struct diffstat_t *diffstat) int i; for (i = 0; i < diffstat->nr; i++) { struct diffstat_file *f = diffstat->files[i]; - if (f->name != f->print_name) - free(f->print_name); + free(f->print_name); free(f->name); free(f->from_name); free(f); @@ -5241,10 +5232,12 @@ static void show_rename_copy(struct diff_options *opt, const char *renamecopy, struct diff_filepair *p) { struct strbuf sb = STRBUF_INIT; - char *names = pprint_rename(p->one->path, p->two->path); + struct strbuf names = STRBUF_INIT; + + pprint_rename(, p->one->path, p->two->path); strbuf_addf(, " %s %s (%d%%)\n", - renamecopy, names, similarity_index(p)); - free(names); + renamecopy, names.buf, similarity_index(p)); + strbuf_release(); emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY, sb.buf, sb.len, 0);
[PATCH v4 1/2] diff.c: refactor pprint_rename() to use strbuf
Instead of passing char* around, let function handle strbuf directly. All callers already use strbuf internally. This helps kill the "not free" exception in free_diffstat_info(). I don't think this code is so critical that we need to avoid some free() calls. The other benefit comes in the next patch, where we append something in pname before returning from fill_print_name(). With strbuf, it's very simple. With "char *" we may have to resort to explicit reallocation and stuff. Signed-off-by: Nguyễn Thái Ngọc Duy--- diff.c | 59 ++ 1 file changed, 26 insertions(+), 33 deletions(-) diff --git a/diff.c b/diff.c index 21c3838b25..e3f72de27d 100644 --- a/diff.c +++ b/diff.c @@ -2045,11 +2045,10 @@ static void fn_out_consume(void *priv, char *line, unsigned long len) } } -static char *pprint_rename(const char *a, const char *b) +static void pprint_rename(struct strbuf *name, const char *a, const char *b) { const char *old = a; const char *new = b; - struct strbuf name = STRBUF_INIT; int pfx_length, sfx_length; int pfx_adjust_for_slash; int len_a = strlen(a); @@ -2059,10 +2058,10 @@ static char *pprint_rename(const char *a, const char *b) int qlen_b = quote_c_style(b, NULL, NULL, 0); if (qlen_a || qlen_b) { - quote_c_style(a, , NULL, 0); - strbuf_addstr(, " => "); - quote_c_style(b, , NULL, 0); - return strbuf_detach(, NULL); + quote_c_style(a, name, NULL, 0); + strbuf_addstr(name, " => "); + quote_c_style(b, name, NULL, 0); + return; } /* Find common prefix */ @@ -2109,19 +2108,18 @@ static char *pprint_rename(const char *a, const char *b) if (b_midlen < 0) b_midlen = 0; - strbuf_grow(, pfx_length + a_midlen + b_midlen + sfx_length + 7); + strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 7); if (pfx_length + sfx_length) { - strbuf_add(, a, pfx_length); - strbuf_addch(, '{'); + strbuf_add(name, a, pfx_length); + strbuf_addch(name, '{'); } - strbuf_add(, a + pfx_length, a_midlen); - strbuf_addstr(, " => "); - strbuf_add(, b + pfx_length, b_midlen); + strbuf_add(name, a + pfx_length, a_midlen); + strbuf_addstr(name, " => "); + strbuf_add(name, b + pfx_length, b_midlen); if (pfx_length + sfx_length) { - strbuf_addch(, '}'); - strbuf_add(, a + len_a - sfx_length, sfx_length); + strbuf_addch(name, '}'); + strbuf_add(name, a + len_a - sfx_length, sfx_length); } - return strbuf_detach(, NULL); } struct diffstat_t { @@ -2197,23 +2195,17 @@ static void show_graph(struct strbuf *out, char ch, int cnt, static void fill_print_name(struct diffstat_file *file) { - char *pname; + struct strbuf pname = STRBUF_INIT; if (file->print_name) return; - if (!file->is_renamed) { - struct strbuf buf = STRBUF_INIT; - if (quote_c_style(file->name, , NULL, 0)) { - pname = strbuf_detach(, NULL); - } else { - pname = file->name; - strbuf_release(); - } - } else { - pname = pprint_rename(file->from_name, file->name); - } - file->print_name = pname; + if (file->is_renamed) + pprint_rename(, file->from_name, file->name); + else + quote_c_style(file->name, , NULL, 0); + + file->print_name = strbuf_detach(, NULL); } static void print_stat_summary_inserts_deletes(struct diff_options *options, @@ -2797,8 +2789,7 @@ static void free_diffstat_info(struct diffstat_t *diffstat) int i; for (i = 0; i < diffstat->nr; i++) { struct diffstat_file *f = diffstat->files[i]; - if (f->name != f->print_name) - free(f->print_name); + free(f->print_name); free(f->name); free(f->from_name); free(f); @@ -5241,10 +5232,12 @@ static void show_rename_copy(struct diff_options *opt, const char *renamecopy, struct diff_filepair *p) { struct strbuf sb = STRBUF_INIT; - char *names = pprint_rename(p->one->path, p->two->path); + struct strbuf names = STRBUF_INIT; + + pprint_rename(, p->one->path, p->two->path); strbuf_addf(, " %s %s (%d%%)\n", - renamecopy, names, similarity_index(p)); - free(names); + renamecopy, names.buf, similarity_index(p)); + strbuf_release(); emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY, sb.buf, sb.len, 0);
[PATCH v4 2/2] diff: add --stat-with-summary
Certain information is currently shown with --summary, but when used in combination with --stat it's a bit hard to read since info of the same file is in two places (--stat and --summary). On top of that, commits that add or remove files double the number of display lines, which could be a lot if you add or remove a lot of files. --stat-with-summary embeds most of --summary back in --stat in the little space between the file name part and the graph line, e.g. with commit 0433d533f1: Documentation/merge-config.txt | 4 + builtin/merge.c| 2 + ...-pull-verify-signatures.sh (new +x) | 81 ++ t/t7612-merge-verify-signatures.sh | 45 4 files changed, 132 insertions(+) It helps both condensing information and saving some text space. What's new in diffstat is: - A new 0644 file is shown as (new) - A new 0755 file is shown as (new +x) - A new symlink is shown as (new +l) - A deleted file is shown as (gone) - A mode change adding executable bit is shown as (mode +x) - A mode change removing it is shown as (mode -x) Note that --stat-with-summary does not contain all the information --summary provides. Rewrite percentage is not shown but it could be added later, like R50% or C20%. Signed-off-by: Nguyễn Thái Ngọc Duy--- Documentation/diff-options.txt| 8 diff.c| 37 +++ diff.h| 1 + t/t4013-diff-various.sh | 5 +++ ...ty_--root_--stat_--compact-summary_initial | 12 ++ ...-R_--root_--stat_--compact-summary_initial | 12 ++ ...tree_--stat_--compact-summary_initial_mode | 4 ++ ...e_-R_--stat_--compact-summary_initial_mode | 4 ++ 8 files changed, 83 insertions(+) create mode 100644 t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial create mode 100644 t/t4013/diff.diff-tree_--stat_--compact-summary_initial_mode create mode 100644 t/t4013/diff.diff-tree_-R_--stat_--compact-summary_initial_mode diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index c330c01ff0..e3a44f03cd 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -128,6 +128,14 @@ have to use `--diff-algorithm=default` option. These parameters can also be set individually with `--stat-width=`, `--stat-name-width=` and `--stat-count=`. +--compact-summary:: + Output a condensed summary of extended header information such + as file creations or deletions ("new" or "gone", optionally "+l" + if it's a symlink) and mode changes ("+x" or "-x" for adding + or removing executable bit respectively) in diffstat. The + information is put betwen the filename part and the graph + part. Implies `--stat`. + --numstat:: Similar to `--stat`, but shows number of added and deleted lines in decimal notation and pathname without diff --git a/diff.c b/diff.c index e3f72de27d..62e413a80f 100644 --- a/diff.c +++ b/diff.c @@ -2129,6 +2129,7 @@ struct diffstat_t { char *from_name; char *name; char *print_name; + const char *comments; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -2205,6 +2206,9 @@ static void fill_print_name(struct diffstat_file *file) else quote_c_style(file->name, , NULL, 0); + if (file->comments) + strbuf_addf(, " (%s)", file->comments); + file->print_name = strbuf_detach(, NULL); } @@ -3239,6 +3243,32 @@ static void builtin_diff(const char *name_a, return; } +static char *get_compact_summary(const struct diff_filepair *p, int is_renamed) +{ + if (!is_renamed) { + if (p->status == DIFF_STATUS_ADDED) { + if (S_ISLNK(p->two->mode)) + return "new +l"; + else if ((p->two->mode & 0777) == 0755) + return "new +x"; + else + return "new"; + } else if (p->status == DIFF_STATUS_DELETED) + return "gone"; + } + if (S_ISLNK(p->one->mode) && !S_ISLNK(p->two->mode)) + return "mode -l"; + else if (!S_ISLNK(p->one->mode) && S_ISLNK(p->two->mode)) + return "mode +l"; + else if ((p->one->mode & 0777) == 0644 && +(p->two->mode & 0777) == 0755) + return "mode +x"; + else if ((p->one->mode & 0777) == 0755 && +(p->two->mode & 0777) == 0644) + return "mode -x"; + return NULL; +} + static void builtin_diffstat(const char *name_a, const char *name_b,
[PATCH v4 0/2] diff: add --compact-summary (aka nd/diff-stat-with-summary)
v4 renames the option back to --compact-summary. I can't think of any better name. Interdiff diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 595e4cd548..e3a44f03cd 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -128,7 +128,7 @@ have to use `--diff-algorithm=default` option. These parameters can also be set individually with `--stat-width=`, `--stat-name-width=` and `--stat-count=`. ---stat-with-summary:: +--compact-summary:: Output a condensed summary of extended header information such as file creations or deletions ("new" or "gone", optionally "+l" if it's a symlink) and mode changes ("+x" or "-x" for adding diff --git a/diff.c b/diff.c index e7ff7dceb7..62e413a80f 100644 --- a/diff.c +++ b/diff.c @@ -4332,7 +4332,6 @@ static int stat_opt(struct diff_options *options, const char **av) int graph_width = options->stat_graph_width; int count = options->stat_count; int argcount = 1; - unsigned with_summary = options->flags.stat_with_summary; if (!skip_prefix(arg, "--stat", )) die("BUG: stat option does not begin with --stat: %s", arg); @@ -4376,9 +4375,6 @@ static int stat_opt(struct diff_options *options, const char **av) count = strtoul(av[1], , 10); argcount = 2; } - } else if (skip_prefix(arg, "-with-summary", )) { - with_summary = 1; - end = (char*)arg; } break; case '=': @@ -4397,7 +4393,6 @@ static int stat_opt(struct diff_options *options, const char **av) options->stat_graph_width = graph_width; options->stat_width = width; options->stat_count = count; - options->flags.stat_with_summary = with_summary; return argcount; } @@ -4579,11 +4574,13 @@ int diff_opt_parse(struct diff_options *options, else if (!strcmp(arg, "-s") || !strcmp(arg, "--no-patch")) options->output_format |= DIFF_FORMAT_NO_OUTPUT; else if (starts_with(arg, "--stat")) - /* -* --stat, --stat-width, --stat-name-width, -* --stat-count or --stat-with-summary. -*/ + /* --stat, --stat-width, --stat-name-width, or --stat-count */ return stat_opt(options, av); + else if (!strcmp(arg, "--compact-summary")) { +options->flags.stat_with_summary = 1; +options->output_format |= DIFF_FORMAT_DIFFSTAT; + } else if (!strcmp(arg, "--no-compact-summary")) +options->flags.stat_with_summary = 0; /* renames options */ else if (starts_with(arg, "-B") || diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index aa6f5da21c..3f9a24fd56 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -362,10 +362,10 @@ diff --no-index --raw --abbrev=4 dir2 dir :noellipses diff --no-index --raw --abbrev=4 dir2 dir diff --no-index --raw --no-abbrev dir2 dir -diff-tree --pretty --root --stat-with-summary initial -diff-tree --pretty -R --root --stat-with-summary initial -diff-tree --stat-with-summary initial mode -diff-tree -R --stat-with-summary initial mode +diff-tree --pretty --root --stat --compact-summary initial +diff-tree --pretty -R --root --stat --compact-summary initial +diff-tree --stat --compact-summary initial mode +diff-tree -R --stat --compact-summary initial mode EOF test_expect_success 'log -S requires an argument' ' diff --git a/t/t4013/diff.diff-tree_--pretty_--root_--stat-with-summary_initial b/t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial similarity index 78% rename from t/t4013/diff.diff-tree_--pretty_--root_--stat-with-summary_initial rename to t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial index 105f29a92d..d6451ff7cc 100644 --- a/t/t4013/diff.diff-tree_--pretty_--root_--stat-with-summary_initial +++ b/t/t4013/diff.diff-tree_--pretty_--root_--stat_--compact-summary_initial @@ -1,4 +1,4 @@ -$ git diff-tree --pretty --root --stat-with-summary initial +$ git diff-tree --pretty --root --stat --compact-summary initial commit 444ac553ac7612cc88969031b02b3767fb8a353a Author: A U ThorDate: Mon Jun 26 00:00:00 2006 + diff --git a/t/t4013/diff.diff-tree_--pretty_-R_--root_--stat-with-summary_initial b/t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial similarity index 78% rename from t/t4013/diff.diff-tree_--pretty_-R_--root_--stat-with-summary_initial rename to t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial index 45008d09fc..1989e55cd0 100644 --- a/t/t4013/diff.diff-tree_--pretty_-R_--root_--stat-with-summary_initial +++ b/t/t4013/diff.diff-tree_--pretty_-R_--root_--stat_--compact-summary_initial @@ -1,4 +1,4
Re: [PATCH v3 06/42] completion: use __gitcomp_builtin in _git_am
On Sat, Feb 24, 2018 at 1:08 AM, Junio C Hamanowrote: > Duy Nguyen writes: > >> On Fri, Feb 23, 2018 at 1:19 AM, Junio C Hamano wrote: >>> Duy Nguyen writes: >>> Now that you mention it, the only command that completes --rerere-autoupdate is git-merge. Since this is "auto" I don't think people want to type manually. >>> >>> Sorry, but I do not quite get the connection between "since this is >>> 'auto'" and the rest of the sentence. Is it just it is so lengthy >>> that people do not want to type and are likely to use completion? >> >> Well, if it is to be done automatically, I should not need to tell it >> manually (by typing the option on command line). Granted it's a weak >> argument. > > Perhaps I am not reading what the option does correctly, but my > understanding is that merge operations: > > - always allow rerere to intervene and auto-resolve in the working >tree, as long as rerere is enabled; > > - by default they however do not add the rerere-resolved result to >the index, so that the combined diff output from "git diff" can >be used as a way to sanity check the result; but > > - if the user says --rerere-autoupdate, they add the >rerere-resolved result to the index, letting the user blindly >trust it, so that such a trusting user can even commit the result >without looking at it by merely making sure there is no path >remaining in the "git ls-files -u" output. > > The "autoupdate" could be somewhat dangerous depending on your > workflow, and that is why the user can selectively enable it via the > command line option (when known to be safe) even if the user does > not set rerere.autoupdate to true. > > So I am not sure I understand "an option to allow something to be > done automatically should not be given manually" as a valid argument > at all, whether it is weak or strong. > > Or am I grossly confused? Nope. It's just me not understanding rerere functionality. -- Duy
Re: [PATCH v2 4/5] diff.c: initialize hash algo when running in --no-index mode
On Sat, Feb 24, 2018 at 3:15 PM, Eric Sunshinewrote: > On Fri, Feb 23, 2018 at 10:34 PM, Nguyễn Thái Ngọc Duy > wrote: >> before, compared to the alternative that we simply do not hash). >> >> dòng được > > Accidental paste? Oops. I blame Emacs. -- Duy
Re: [PATCH 1/1] git-p4: add unshelve command
On Fri, Feb 23, 2018 at 6:22 PM, Luke Diamandwrote: > On 22 February 2018 at 22:28, Luke Diamand wrote: >> On 22 February 2018 at 21:39, Miguel Torroja >> wrote: >>> Hi Luke, >>> >>> I really like the idea of creating a branch based on a shelved CL (We >>> particularly use shelves all the time), I tested your change and I >>> have some comments. >>> >>> - I have some concerns about having the same "[git-p4...change = >>> .]" as if it were a real submitted CL. >>> One use case I foresee of the new implementation could be to >>> cherry-pick that change on another branch (or current branch) prior to >>> a git p4 submit. >> >> OK, I think we could just not add that in the case of an unshelved commit. >> >>> >>> - I see that the new p4/unshelve... branch is based on the tip of >>> p4/master by default. what if we set the default to the current HEAD? >> >> There's a "--origin" option you can use to set it to whatever you want. >> >> I started out with HEAD as the default, but then found that to get a >> sensible diff you have to both sync and rebase, which can be quite >> annoying. >> >> In my case, in my early testing, I ended up with a git commit which >> included both the creation of a file, and a subsequent change, even >> though I had only unshelved the subsequent change. That was because >> HEAD didn't include the file creation change (but p4/master _did_). > > Discussing this with some of my colleagues, and playing around with > it, it seems that what it really needs to do is to figure out the > parent commit of the shelved changelist, and use that as the basis for > the diff. > > Unfortunately, Perforce doesn't have any concept of a "parent commit". > One option that would be possible to implement though is to look at > the shelved changelist, and foreach file, find the original revision > number ("//depot/foo.c#97"). Then "p4 changes //depot/foo.c" would > give you the changelist number for that file. Find the most recent P4 > changelist, find the git commit corresponding to that, and do the diff > against that. > > It's pretty clunky, and I'm quite glad I didn't try to do that in the > initial revision, as I would surely have given up! > > To do it properly of course you need to handle the case where the > shelved changelist author had some files at one changelist, and others > at another. But I think that's just far too complicated to deal with. > > Luke The behavior of "p4 unshelve" and your "git p4 unshelve" approach I think are equivalent as p4 just copies the whole contents of the shelved file locally, regardless of the previous revision synced. In other words, you get the same result with "git p4 unshelve" than with "p4 unshelve" now. What about creating a new command in a future update to apply just the change to your local tree? One approach we took in the past was to create a diff patch and then apply it to the working tree. The command "p4 describe -S -du 12345" will output a patch but it has two problems: * the header for each file is not standard, it has to be parsed and converted, (it starts with " //depot/..." and it needs to be converted to "--- a/...") * The new files are not output as a patch let's say we call the command "git p4 cherry-pick" Miguel
Re: [PATCH 01/11] t: prevent '-x' tracing from interfering with test helpers' stderr
On Sat, Feb 24, 2018 at 12:39 AM, SZEDER Gáborwrote: > - Duplicate stderr of the tested command executed in the test helper > function from the function's fd 7 (see next point), to ensure that > the tested command's error messages go to a different fd than the > '-x' trace of the commands executed in the function. > > - Duplicate the test helper function's fd 7 from the function's > original stderr, meaning that, after taking a detour through fd 7, > the error messages of the tested command do end up on the > function's original stderr. > diff --git a/t/lib-terminal.sh b/t/lib-terminal.sh > index cd220e378e..b3acb4c6f8 100644 > --- a/t/lib-terminal.sh > +++ b/t/lib-terminal.sh > @@ -9,8 +9,8 @@ test_terminal () { > echo >&4 "test_terminal: need to declare TTY prerequisite" > return 127 > fi > - perl "$TEST_DIRECTORY"/test-terminal.perl "$@" > -} > + perl "$TEST_DIRECTORY"/test-terminal.perl "$@" 2>&9 > +} 9>&2 2>&4 Oops, these should duplicate from/to fd 7, not fd 9.
Re: [PATCH v2 4/5] diff.c: initialize hash algo when running in --no-index mode
On Fri, Feb 23, 2018 at 10:34 PM, Nguyễn Thái Ngọc Duywrote: > Our "git diff" command supports running as a standalone tool. In this > code path, we try to hash the file content but after > 18e2588e11 (sha1_file: switch uses of SHA-1 to the_hash_algo - > 2018-02-01), there is a chance that the_hash_algo (required by > index_path) may still be uninitialized if no repository is found. > > Executing index_path() when the_hash_algo is NULL (or points to unknown > algo) either crashes or dies. Let's make it a bit safer by explicitly > falling back to SHA-1 (so that the diff output remains the same as > before, compared to the alternative that we simply do not hash). > > dòng được Accidental paste? > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > diff --git a/diff.c b/diff.c > @@ -3995,6 +3995,18 @@ static void run_diff(struct diff_filepair *p, struct > diff_options *o) > + /* > +* NEEDSWORK: When running in no-index mode (and no repo is > +* found, thus no hash algo conifugred), fall back to SHA-1 s/conifugred/configured/ > +* hashing (which is used by diff_fill_oid_info below) to > +* avoid regression in diff output. > +* > +* In future, perhaps we can allow the user to specify their > +* hash algorithm from command line in this mode. > +*/ > + if (o->flags.no_index && !the_hash_algo) > + the_hash_algo = _algos[GIT_HASH_SHA1]; > + > diff_fill_oid_info(one); > diff_fill_oid_info(two);
Re: [PATCH 04/11] t3030-merge-recursive: don't check the stderr of a subshell
On Fri, Feb 23, 2018 at 6:39 PM, SZEDER Gáborwrote: > The two test checking 'git mmerge-recursive' in an empty worktree in s/mmerge/merge/, I guess. > 't3030-merge-recursive.sh' fail when the test script is run with '-x' > tracing (and using a shell other than a Bash version supporting > BASH_XTRACEFD). The reason for those failures is that the tests check > the emptiness of a subshell's stderr, which includes the trace of > commands executed in that subshell as well, throwing off the emptiness > check. > > Note that both subshells execute four git commands each, meaning that > checking the emptiness of the whole subshell implicitly ensures that > not only 'git merge-recursive' but none of the other three commands > outputs anything to their stderr. Note also that if one of those > commands were to output anything on its stderr, then the current > combined check would not tell us which one of those four commands the > unexpected output came from. > > Save the stderr of those four commands only instead of the whole > subshell, so it remains free from tracing output, and save and check > them individually, so they will show us from which command the > unexpected output came from. > > After this change t3030 passes with '-x', even when running with > /bin/sh. > > Signed-off-by: SZEDER Gábor