[RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches
From: Linus Torvalds torva...@linux-foundation.org Date: Wed, 22 Jan 2014 12:32:30 -0800 Subject: [PATCH] Make 'git request-pull' more strict about matching local/remote branches The current 'request-pull' will try to find matching commit on the given remote, and rewrite the please pull line to match that remote ref. That may be very helpful if your local tree doesn't match the layout of the remote branches, but for the common case it's been a recurring disaster, when request-pull is done against a delayed remote update, and it rewrites the target branch randomly to some other branch name that happens to have the same expected SHA1 (or more commonly, leaves it blank). To avoid that recurring problem, this changes git request-pull so that it matches the ref name to be pulled against the *local* repository, and then warns if the remote repository does not have that exact same branch or tag name and content. This means that git request-pull will never rewrite the ref-name you gave it. If the local branch name is xyzzy, that is the only branch name that request-pull will ask the other side to fetch. If the remote has that branch under a different name, that's your problem and git request-pull will not try to fix it up (but git request-pull will warn about the fact that no exact matching branch is found, and you can edit the end result to then have the remote name you want if it doesn't match your local one). The new find local ref code will also complain loudly if you give an ambiguous refname (eg you have both a tag and a branch with that same name, and you don't specify heads/name or tags/name). Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- This should fix the problem we've had multiple times with kernel maintainers, where git request-pull ends up leaving the target branch name blank, because people either forgot to push it, or (more commonly) people pushed it just before doing the pull request, and it hadn't actually had time to mirror out to the public site. Now, git request-pull will *warn* about the fact that the matching ref isn't found on the remote (and the new matching code is stricter at that), but it will never try to re-write the branch name that it asks the other end to pull. So if the remote branch doesn't exist, you'll get a warning, but the pull request will still have the branch you specified. The whole checking thing is both simplified (removing more lines than it adds) and made more strict. Comments? It passes the tests I put it through locally, but I did *not* make it pass the test-suite, since it very much does change the rules. Some of the test suite code literally tests for the old completely broken case (at least t5150, subtests 4 and 5). Thus the RFC part. Because the currect git request-pull behavior has been horrible. git-request-pull.sh | 110 1 file changed, 43 insertions(+), 67 deletions(-) diff --git a/git-request-pull.sh b/git-request-pull.sh index fe21d5db631c..659a412155d8 100755 --- a/git-request-pull.sh +++ b/git-request-pull.sh @@ -35,20 +35,7 @@ do shift done -base=$1 url=$2 head=${3-HEAD} status=0 branch_name= - -headref=$(git symbolic-ref -q $head) -if git show-ref -q --verify $headref -then - branch_name=${headref#refs/heads/} - if test z$branch_name = z$headref || - ! git config branch.$branch_name.description /dev/null - then - branch_name= - fi -fi - -tag_name=$(git describe --exact $head^0 2/dev/null) +base=$1 url=$2 status=0 test -n $base test -n $url || usage @@ -58,55 +45,68 @@ then die fatal: Not a valid revision: $base fi +# +# $3 must be a symbolic ref, a unique ref, or +# a SHA object expression +# +head=$(git symbolic-ref -q ${3-HEAD}) +head=${head:-$(git show-ref ${3-HEAD} | cut -d' ' -f2)} +head=${head:-$(git rev-parse --quiet --verify $3)} + +# None of the above? Bad. +test -z $head die fatal: Not a valid revision: $3 + +# This also verifies that the resulting head is unique: +# git show-ref could have shown multiple matching refs.. headrev=$(git rev-parse --verify --quiet $head^0) -if test -z $headrev +test -z $headrev die fatal: Ambiguous revision: $3 + +# Was it a branch with a description? +branch_name=${head#refs/heads/} +if test z$branch_name = z$headref || + ! git config branch.$branch_name.description /dev/null then -die fatal: Not a valid revision: $head + branch_name= fi +prettyhead=${head#refs/} +prettyhead=${prettyhead#heads/} + merge_base=$(git merge-base $baserev $headrev) || die fatal: No commits in common between $base and $head -# $head is the token given from the command line, and $tag_name, if -# exists, is the tag we are going to show the commit information for. -# If that tag exists at the remote and it points at the commit, use it. -# Otherwise, if a branch with the same name as $head exists at the remote
Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches
On Wed, Jan 22, 2014 at 1:46 PM, Junio C Hamano gits...@pobox.com wrote: The new find local ref code will also complain loudly if you give an ambiguous refname (eg you have both a tag and a branch with that same name, and you don't specify heads/name or tags/name). But this part might be a bit problematic. $3=master will almost always have refs/heads/master and refs/remotes/origin/master listed because the call to show-ref comes before rev-parse --verify, no? Hmm. Yes. It's done that way very much on purpose, to avoid the branch/tag ambiguity (which we have had problems with), but you're right, it also ends up being ambiguous wrt remote branches, which wasn't the intention, and you're right, that is not acceptable. Damn. I very much want to get the full ref-name (ie master should become refs/heads/master), and I do want to avoid the branch/tag ambiguity, but you're right, show-ref plus the subsequent rev-parse --verify comes close but not quite close enough. Any ideas? The hacky way is to do | head -1 to take the first show-ref output, and then check if you get a different result if you re-do it using show-ref --tags. But that sounds really excessively hacky. Is there a good way to do it? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches
On Wed, Jan 22, 2014 at 2:03 PM, Linus Torvalds torva...@linux-foundation.org wrote: Any ideas? The hacky way is to do | head -1 to take the first show-ref output, and then check if you get a different result if you re-do it using show-ref --tags. But that sounds really excessively hacky. Is there a good way to do it? Using git show-refs --tags --heads would work for the common case (since that ignores remote branches), but would then disallow remote branches entirely. That might be ok in practice, but it's definitely wrong too. I'm probably missing some obvious solution. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches
On Wed, Jan 22, 2014 at 2:14 PM, Junio C Hamano gits...@pobox.com wrote: I looked at 5150.4 and found that what it attempts to do is halfway sensible. I agree that it is half-way sensible. The important bit being the HALF part. The half part is why we have the semantics we have. There's no question about that. The problem is, the *other* half is pure and utter crap. The half-way sensible solution then generates pure and utter garbage in the totally sensible case. And that's why I think it needs to be fixed. Not because the existing behavior can never make sense in some circumstances, but because the existing behavior can screw up really really badly in other (arguably more common, and definitely real) circumstances. For the kernel, the broken missing branch name situation has come up pretty regularly. This is definitely not a one-time event, it's more like almost every merge window somebody gets screwed by this and I have to guess what the branch name should have been. I think that we could potentially do a local:remote syntax for that half-way sensible case, so that if you do git push .. master:for-linus then you have to do git request-pull .. master:for-linus to match the fact that you renamed your local branch on the remote. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote
From: Linus Torvalds torva...@linux-foundation.org Date: Wed, 22 Jan 2014 15:23:48 -0800 Subject: [PATCH] Make request-pull able to take a refspec of form local:remote This allows a user to say that a local branch has a different name on the remote server, using the same syntax that git push uses to create that situation. Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- So this relaxes the remote matching, and allows using the local:remote syntax to say that the local branch is differently named from the remote one. It is probably worth folding it into the previous patch if you think this whole approach is workable. git-request-pull.sh | 50 +- 1 file changed, 29 insertions(+), 21 deletions(-) diff --git a/git-request-pull.sh b/git-request-pull.sh index 659a412155d8..c8ab0e912011 100755 --- a/git-request-pull.sh +++ b/git-request-pull.sh @@ -47,19 +47,23 @@ fi # # $3 must be a symbolic ref, a unique ref, or -# a SHA object expression +# a SHA object expression. It can also be of +# the format 'local-name:remote-name'. # -head=$(git symbolic-ref -q ${3-HEAD}) -head=${head:-$(git show-ref ${3-HEAD} | cut -d' ' -f2)} -head=${head:-$(git rev-parse --quiet --verify $3)} +local=${3%:*} +local=${local:-HEAD} +remote=${3#*:} +head=$(git symbolic-ref -q $local) +head=${head:-$(git show-ref --heads --tags $local | cut -d' ' -f2)} +head=${head:-$(git rev-parse --quiet --verify $local)} # None of the above? Bad. -test -z $head die fatal: Not a valid revision: $3 +test -z $head die fatal: Not a valid revision: $local # This also verifies that the resulting head is unique: # git show-ref could have shown multiple matching refs.. headrev=$(git rev-parse --verify --quiet $head^0) -test -z $headrev die fatal: Ambiguous revision: $3 +test -z $headrev die fatal: Ambiguous revision: $local # Was it a branch with a description? branch_name=${head#refs/heads/} @@ -69,9 +73,6 @@ then branch_name= fi -prettyhead=${head#refs/} -prettyhead=${prettyhead#heads/} - merge_base=$(git merge-base $baserev $headrev) || die fatal: No commits in common between $base and $head @@ -81,30 +82,37 @@ die fatal: No commits in common between $base and $head # # Otherwise find a random ref that matches $headrev. find_matching_ref=' - my ($exact,$found); + my ($head,$headrev) = (@ARGV); + my ($found); + while (STDIN) { + chomp; my ($sha1, $ref, $deref) = /^(\S+)\s+([^^]+)(\S*)$/; - next unless ($sha1 eq $ARGV[1]); - if ($ref eq $ARGV[0]) { - $exact = $ref; + my ($pattern); + next unless ($sha1 eq $headrev); + + $pattern=/$head\$; + if ($ref eq $head) { + $found = $ref; + } + if ($ref =~ /$pattern/) { + $found = $ref; } - if ($sha1 eq $ARGV[0]) { + if ($sha1 eq $head) { $found = $sha1; } } - if ($exact) { - print $exact\n; - } elsif ($found) { + if ($found) { print $found\n; } ' -ref=$(git ls-remote $url | @@PERL@@ -e $find_matching_ref $head $headrev) +ref=$(git ls-remote $url | @@PERL@@ -e $find_matching_ref ${remote:-HEAD} $headrev) if test -z $ref then - echo warn: No match for $prettyhead found at $url 2 - echo warn: Are you sure you pushed '$prettyhead' there? 2 + echo warn: No match for commit $headrev found at $url 2 + echo warn: Are you sure you pushed '${remote:-HEAD}' there? 2 status=1 fi @@ -116,7 +124,7 @@ git show -s --format='The following changes since commit %H: are available in the git repository at: ' $merge_base -echo $url $prettyhead +echo $url $remote git show -s --format=' for you to fetch changes up to %H: -- 1.9.rc0.10.gf0799f9.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote
On Thu, Jan 23, 2014 at 11:43 AM, Junio C Hamano gits...@pobox.com wrote: I am not sure if it is a good idea to hand-craft resulting head is unique constraint here. We already have disambiguation rules (and warning mechanism) we use in other places---this part should use the same rule, I think. If you can fix that, then yes, that would be lovely. As it is, I couldn't find any easily scriptable way to do that. # # Otherwise find a random ref that matches $headrev. find_matching_ref=' + my ($head,$headrev) = (@ARGV); + my ($found); + while (STDIN) { + chomp; my ($sha1, $ref, $deref) = /^(\S+)\s+([^^]+)(\S*)$/; + my ($pattern); + next unless ($sha1 eq $headrev); + + $pattern=/$head\$; I think $head is constant inside the loop, so lift it outside? Yes. I'm not really a perl person, and this came from me trying to make the code more readable (and it used to do that magic quoting thing inside the loop, I just used a helper pattern variable). + if ($sha1 eq $head) { I think this is $headrev ($head may be $remote or HEAD), but then anything that does not point at $headrev has already been rejected at the beginning of this loop, so...? No, this is for when head ends up not being a ref, but a SHA1 expression. IOW, for when you do something odd like git request-pull HEAD^^ origin HEAD^ when hacking things together. It doesn't actually generate the right request-pull message (because there's no valid branch name), but it *works* in the sense that you can get the diffstat etc and edit things manually. It's not a big deal - it has never really worked, and I actually broke that when I then used $remote that doesn't actually have the SHA1 any more. + if ($found) { print $found\n; } ' I somehow feel that this is inadequate to catch the delayed propagation error in the opposite direction. The publish repository may have an unrelated ref pointing at the $headrev and we may guess that is the ref to be fetched by the integrator based on that, but by the time the integrator fetches from the repository, the ref may have been updated to its new value that does not match $headrev. But I do not think of a way to solve that one. Yes, so you'll get a warning (or, if you get a partial match, maybe not even that), but the important part about all these changes is that it DOESN'T MATTER. Why? Because it no longer re-writes the target branch name based on that match or non-match. So the pull request will be fine. In other words, the really fundamental change here i that the oops, I couldn't find things on the remote no longer affects the output. It only affects the warning. And I think that's important. It used to be that the remote matching actually changed the output of the request-pull, and *THAT* was the fundamental problem. In any case, shouldn't we be catching duplicate matches here, if the real objective is to make it less likely for the users to make mistakes? It would be good, yes. But my perl-fu is weak, and I really didn't want to worry about it. Also, as above: my primary issue was to not screw up the output, so the remote matching actually has become much less important, and now the warning about it is purely about being helpful, it no longer fundamentally alters any semantics. So I agree that there is room for improvement, but that's kind of separate from the immediate problem I was trying to solve. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote
On Thu, Jan 23, 2014 at 2:58 PM, Junio C Hamano gits...@pobox.com wrote: Will be fine, provided if they always use local:remote syntax, I'd agree. Why? No sane user should actually need to use the local:remote syntax. The normal situation should be that you create the correctly named branch or tag locally, and then push it out under that name. So I don't actually think anybody should need to be retrained, or always use the local:remote syntax. The local:remote syntax exists only for that special insane case where you used (the same) local:remote syntax to push out a branch under a different name. [ And yeah, maybe that behavior is more common than I think, but even if it is, such behavior would always be among people who are *very* aware of the whole local branch vs remote branch name is different situation. ] Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re* [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote
On Wed, Jan 29, 2014 at 3:34 PM, Junio C Hamano gits...@pobox.com wrote: I am not yet doing the docs, but here is a minimal (and I think is the most sensible) fix to the If I asked a tag to be pulled, I used to get the message from the tag in the output---the updated code no longer does so problem. That was a complete oversight/bug on my part, due to just removing the tag_name special cases, not thinking about the tag message. Thinking some more about the tag_name issue, I realize that the other patch (Make request-pull able to take a refspec of form local:remote) broke another thing. The first patch pretty-printed the local branch-name, removing refs/ and possibly heads/ from the local refname. So for a branch, it would ask people to just pull from the branch-name, and for a tag it would ask people to pull from tags/name, which is good policy. So if you had a tag called for-linus, it would say so (using tags/for-linus). But the local:remote syntax thing ends up breaking that nice feature. The old find_matching_refs would actually cause us to show the tags part if it existed on the remote, but that had become pointless and counter-productive with the first patch. But with the second patch, maybe we should reinstate that logic.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Odd git diff breakage
I hit this oddity when not remembering the right syntax for --color-words.. Try this (outside of a git repository): touch a b git diff -u --color=words a b and watch it scroll (infinitely) printing out error: option `color' expects always, auto, or never forever. I haven't tried to root-cause it, since I'm supposed to be merging stuff.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd git diff breakage
On Mon, Mar 31, 2014 at 11:30 AM, Junio C Hamano gits...@pobox.com wrote: Hmph, interesting. outside a repository is the key, it seems. Well, you can do it inside a repository too, but then you need to use the --no-index flag to get the diff two files behavior. It will result in the same infinite error messages. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] diff-no-index: correctly diagnose error return from diff_opt_parse()
On Mon, Mar 31, 2014 at 11:47 AM, Junio C Hamano gits...@pobox.com wrote: Instead, make it act like so: $ git diff --no-index --color=words a b error: option `color' expects always, auto, or never fatal: invalid diff option/value: --color=words Thanks, Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Sources for 3.18-rc1 not uploaded
Junio, Brian, it seems that the stability of the git tar output is broken. On Mon, Oct 20, 2014 at 4:59 AM, Konstantin Ryabitsev konstan...@linuxfoundation.org wrote: Looks like 3.18-rc1 upload didn't work: This is why the front page still lists 3.17 as the latest mainline. Want to try again? Ok, tried again, and failed again. If that still doesn't work, you may have to use version 1.7 of git when generating the tarball and signature -- I recall Greg having a similar problem in the past. Ugh, yes, that seems to be it. Current git generates different tar-files than older releases do: tar-1.7.9.7 tar-cur differ: byte 107, line 1 and a quick bisection shows that it is due to commit 10f343ea814f (archive: honor tar.umask even for pax headers) in the current git development version. Junio, quite frankly, I don't think that that fix was a good idea. I'd suggest having a *separate* umask for the pax headers, so that we do not break this long-lasting stability of git archive output in ways that are unfixable and not compatible. kernel.org has relied (for a *long* time) on being able to just upload the signature of the resulting tar-file, because both sides can generate the same tar-fiel bit-for-bit. So instead of using tar_umask, please make it use tar_pax_umask, and have that default to 000. Ok? Something like the attached patch. Or just revert 10f343ea814f entirely. Linus From d5ca7ae0a34e31c48397f59b03ecabda7c5c40b2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds torva...@linux-foundation.org Date: Mon, 20 Oct 2014 08:21:38 -0700 Subject: [PATCH] Don't use the default 'tar.umask' for pax headers That wasn't the original behavior, and doing so breaks the fact that tar-files are bit-for-bit compatible across git versions. If you really want to work around broken receiving tar implementations (dubious, we've not needed to do so before), use [tar] paxumask in the git config file. Or maybe we could expose some command line flag to do so. But don't break existing format compatibility for dubious gains. Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- archive-tar.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index df2f4c8a6437..40139ea4ee4e 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -14,6 +14,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; static int tar_umask = 002; +static int tar_pax_umask = 000; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); @@ -192,7 +193,7 @@ static int write_extended_header(struct archiver_args *args, unsigned int mode; memset(header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_EXT_HEADER; - mode = 0100666 ~tar_umask; + mode = 0100666 ~tar_pax_umask; sprintf(header.name, %s.paxheader, sha1_to_hex(sha1)); prepare_header(args, header, mode, size); write_blocked(header, sizeof(header)); @@ -300,7 +301,7 @@ static int write_global_extended_header(struct archiver_args *args) strbuf_append_ext_header(ext_header, comment, sha1_to_hex(sha1), 40); memset(header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_GLOBAL_HEADER; - mode = 0100666 ~tar_umask; + mode = 0100666 ~tar_pax_umask; strcpy(header.name, pax_global_header); prepare_header(args, header, mode, ext_header.len); write_blocked(header, sizeof(header)); @@ -374,6 +375,15 @@ static int git_tar_config(const char *var, const char *value, void *cb) return 0; } + if (!strcmp(var, tar.paxumask)) { + if (value !strcmp(value, user)) { + tar_pax_umask = umask(0); + } else { + tar_pax_umask = git_config_int(var, value); + } + return 0; + } + return tar_filter_config(var, value, cb); } -- 2.1.2.330.g565301e
Re: Sources for 3.18-rc1 not uploaded
On Mon, Oct 20, 2014 at 3:28 PM, brian m. carlson sand...@crustytoothpaste.net wrote: It doesn't appear that the stability of git archive --format=tar is documented anywhere. Given that, it doesn't seem reasonable to expect that any tar implementation produces bit-for-bit compatible output between versions. The kernel has simple stability rules: if it breaks users, it gets fixed or reverted. That is a damn good rule. I realize that some other projects are crap, and don't care about their users. I hope and believe that git is not in that sad group. The whole it's not documented excuse is pure and utter bollocks. Users don't care. And stability of data should be *expected*, not need some random documentation entry to make it explicit. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Sources for 3.18-rc1 not uploaded
On Tue, Oct 21, 2014 at 1:08 AM, Michael J Gruber g...@drmicha.warpmail.net wrote: Unfortunately, the git archive doc clearly says that the umask is applied to all archive entries. And that clearly wasn't the case (for extended metadata headers) before Brian's fix. Hey, it's time for another round of the world-famous Captain Obvious Quiz Game! Yay! The questions these week are: (1) If reality and documentation do not match, where is the bug? (a) Documentation is buggy (b) Reality is buggy (2) Where would you put the horse in relationship to a horse-drawn carriage? (a) in front (b) in the carriage Now, if you answered (a) to both these questions, and had this been a real quiz show, you might have been a winner and the happy new owner of a remote-controlled four-slice toaster with a fancy digital timer. Sadly, this was just a dry-run for the real thing, to give people a quick taste of the world-famous Captain Obvious Quiz Game. I hope you tune in next week for our exciting all-new questions. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
On Mon, Aug 19, 2013 at 8:41 AM, Steffen Prohaska proha...@zib.de wrote: The reason was that read() immediately returns with EINVAL if nbyte = 2GB. According to POSIX [1], if the value of nbyte passed to read() is greater than SSIZE_MAX, the result is implementation-defined. Yeah, the OS X filesystem layer is an incredible piece of shit. Not only doesn't it follow POSIX, it fails *badly*. Because OS X kernel engineers apparently have the mental capacity of a retarded rodent on crack. Linux also refuses to actually read more than a maximum value in one go (because quite frankly, doing more than 2GB at a time is just not reasonable, especially in unkillable disk wait), but at least Linux gives you the partial read, so that the usual read until you're happy works (which you have to do anyway with sockets, pipes, NFS intr mounts, etc etc). Returning EINVAL is a sign of a diseased mind. I hate your patch for other reasons, though: The problem for read() is addressed in a similar way by introducing a wrapper function in compat that always reads less than 2GB. Why do you do that? We already _have_ wrapper functions for read(), namely xread(). Exactly because you basically have to, in order to handle signals on interruptible filesystems (which aren't POSIX either, but at least sanely so) or from other random sources. And to handle the you can't do reads that big issue. So why isn't the patch much more straightforward? Like the attached totally untested one that just limits the read/write size to 8MB (which is totally arbitrary, but small enough to not have any latency issues even on slow disks, and big enough that any reasonable IO subsystem will still get good throughput). And by totally untested I mean that it actually passes the git test suite, but since I didn't apply your patch nor do I have OS X anywhere, I can't actually test that it fixes *your* problem. But it should. Linus patch.diff Description: Binary data
Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
On Mon, Aug 19, 2013 at 10:16 AM, Junio C Hamano gits...@pobox.com wrote: Linus Torvalds torva...@linux-foundation.org writes: The same argument applies to xwrite(), but currently we explicitly catch EINTR and EAGAIN knowing that on sane systems these are the signs that we got interrupted. Do we catch EINVAL unconditionally in the same codepath? No, and we shouldn't. If EINVAL happens, it will keep happening. But with the size limiter, it doesn't matter, since we won't hit the OS X braindamage. Could EINVAL on saner systems mean completely different thing (like our caller is passing bogus parameters to underlying read/write, which is a program bug we would want to catch)? Yes. Even on OS X, it means that - it's just that OS X notion of what is bogus is pure crap. But the thing is, looping on EINVAL would be wrong even on OS X, since unless you change the size, it will keep happening forever. But with the limit IO to 8MB (or whatever) patch, the issue is moot. If you get an EINVAL, it will be due to something else being horribly horribly wrong. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
On Mon, Aug 19, 2013 at 2:56 PM, Kyle J. McKay mack...@gmail.com wrote: The fact that the entire file is read into memory when applying the filter does not seem like a good thing (see #7-#10 above). Yeah, that's horrible. Its likely bad for performance too, because even if you have enough memory, it blows everything out of the L2/L3 caches, and if you don't have enough memory it obviously causes other problems. So it would probably be a great idea to make the filtering code able to do things in smaller chunks, but I suspect that the patch to chunk up xread/xwrite is the right thing to do anyway. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] Developer's Certificate of Origin: default to COPYING
On Thu, Sep 12, 2013 at 3:30 PM, Junio C Hamano gits...@pobox.com wrote: Linus, this is not limited to us, so I am bothering you; sorry about that. My instinct tells me that some competent lawyers at linux-foundation helped you with the wording of DCO, and we amateurs shouldn't be mucking with the text like this patch does at all, but just in case you might find it interesting... There were lawyers involved, yes. I'm not sure there is any actual confusion, because the fact is, lawyers aren't robots or programmers, and they have the human qualities of understanding implications. So I'm actually inclined to not change legal text unless a lawyer actually tells me that it's needed. Plus even if this change was needed, why would anybody point to COPYING. It's much better to just say the copyright license of the file, knowing that different projects have different rules about this all, and some projects mix files from different sources, where parts of the tree may be under different licenses that may be explained elsewhere.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] Developer's Certificate of Origin: default to COPYING
On Thu, Sep 12, 2013 at 4:15 PM, Richard Hansen rhan...@bbn.com wrote: Is it worthwhile to poke a lawyer about this as a precaution? (If so, who?) Or do we wait for a motivating event? I can poke the lawyer that was originally involved. If people know other lawyers, feel free to poke them too. Just ask them to be realistic, not go into some kind of super-anal lawyer mode where they go off on some what if thing. Note that one issue is that this is kind of like a license change, even if it's arguably just a clarification. I'd expect that a lawyer who is so anal that they think this wording needs change would also think that the DCO version number needs change and then spend half an hour (and $500) talking about how this only affects new sign-offs and how you'd want to make it very obvious how things have changed, Yadda yadda. IOW, my personal opinion is that if you get a lawyer that is _that_ interested in irrelevant details, you have much bigger problems than this particular wording. Lawyers do tend to be particular about wording, but in the end, they tend to also agree that intent matters. At least the good ones who have a case. Once they start talking about the meaning of the word 'is', you know they are just weaselwording and don't actually have any real argument. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-compat-util: Avoid strcasecmp() being inlined
On Fri, Sep 13, 2013 at 12:53 PM, Sebastian Schuberth sschube...@gmail.com wrote: +#ifdef __MINGW32__ +#ifdef __NO_INLINE__ Why do you want to push this insane workaround for a clear Mingw bug? Please have mingw just fix the nasty bug, and the git patch with the trivial wrapper looks much simpler than just saying don't inline anything and that crazy block of nasty mingw magic #defines/. And then document loudly that the wrapper is due to the mingw bug. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] sound fixes for 3.6-rc6
On Thu, Sep 13, 2012 at 7:43 PM, Takashi Iwai ti...@suse.de wrote: are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus *PLEASE* don't do this. You point to a branch, but then the pull request clearly implies there is a tag with extra information in it. And indeed, the actual thing I should pull is not at all for-linus, it seems to be your tags/sound-3.6 tag. I don't know if this is the old git pull-request breakage where it stupidly corrects the remote branch when it verifies the branch name, or whether it's some other scripting problem. I think current git versions should not mess up the tag information, if that's the cause, but please verify. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mailinfo: don't require text mime type for attachments
Currently git am does insane things if the mbox it is given contains attachments with a MIME type that aren't text/*. In particular, it will still decode them, and pass them one line at a time to the mail body filter, but because it has determined that they aren't text (without actually looking at the contents, just at the mime type) the line will be the encoding line (eg 'base64') rather than a line of *content*. Which then will cause the text filtering to fail, because we won't correctly notice when the attachment text switches from the commit message to the actual patch. Resulting in a patch failure, even if patch may be a perfectly well-formed attachment, it's just that the message type may be (for example) application/octet-stream instead of text/plain. Just remove all the bogus games with the message_type. The only difference that code creates is how the data is passed to the filter function (chunked per-pred-code line or per post-decode line), and that difference is *wrong*, since chunking things per pre-decode line can never be a sensible operation, and cannot possibly matter for binary data anyway. This code goes all the way back to March of 2007, in commit 87ab79923463 (builtin-mailinfo.c infrastrcture changes), and apparently Don used to pass random mbox contents to git. However, the pre-decode vs post-decode logic really shouldn't matter even for that case, and more importantly, I fed git am crap is not a valid reason to break *real* patch attachments. If somebody really cares, and determines that some attachment is binary data (by looking at the data, not the MIME-type), the whole attachment should be dismissed, rather than fed in random-sized chunks to handle_filter(). Signed-off-by: Linus Torvalds torva...@linux-foundation.org Cc: Don Zickus dzic...@redhat.com --- builtin/mailinfo.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c index 2b3f4d955eaa..da231400b327 100644 --- a/builtin/mailinfo.c +++ b/builtin/mailinfo.c @@ -19,9 +19,6 @@ static struct strbuf email = STRBUF_INIT; static enum { TE_DONTCARE, TE_QP, TE_BASE64 } transfer_encoding; -static enum { - TYPE_TEXT, TYPE_OTHER -} message_type; static struct strbuf charset = STRBUF_INIT; static int patch_lines; @@ -184,8 +181,6 @@ static void handle_content_type(struct strbuf *line) struct strbuf *boundary = xmalloc(sizeof(struct strbuf)); strbuf_init(boundary, line-len); - if (!strcasestr(line-buf, text/)) -message_type = TYPE_OTHER; if (slurp_attr(line-buf, boundary=, boundary)) { strbuf_insert(boundary, 0, --, 2); if (++content_top content[MAX_BOUNDARIES]) { @@ -657,7 +652,6 @@ again: /* set some defaults */ transfer_encoding = TE_DONTCARE; strbuf_reset(charset); - message_type = TYPE_TEXT; /* slurp in this section's info */ while (read_one_header_line(line, fin)) @@ -871,11 +865,6 @@ static void handle_body(void) strbuf_insert(line, 0, prev.buf, prev.len); strbuf_reset(prev); - /* binary data most likely doesn't have newlines */ - if (message_type != TYPE_TEXT) { - handle_filter(line); - break; - } /* * This is a decoded line that may contain * multiple new lines. Pass only one chunk -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix git diff --stat for interesting - but empty - file changes
The behavior of git diff --stat is rather odd for files that have zero lines of changes: it will discount them entirely unless they were renames. Which means that the stat output will simply not show files that only had other changes: they were created or deleted, or their mode was changed. Now, those changes do show up in the summary, but so do renames, so the diffstat logic is inconsistent. Why does it show renames with zero lines changed, but not mode changes or added files with zero lines changed? So change the logic to not check for is_renamed, but for is_interesting instead, where interesting is judged to be any action but a pure data change (because a pure data change with zero data changed really isn't worth showing, if we ever get one in our diffpairs). So if you did chmod +x Makefile git diff --stat before, it would show empty ( 0 files changed), with this it shows Makefile | 0 1 file changed, 0 insertions(+), 0 deletions(-) which I think is a more correct diffstat (and then with --summary it shows *what* the metadata change to Makefile was - this is completely consistent with our handling of renamed files). Side note: the old behavior was *really* odd. With no changes at all, git diff --stat output was empty. With just a chmod, it said 0 files changed. No way is our legacy behavior sane. Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- This was triggered by kernel developers not noticing that they had added zero-sized files, because those additions never showed up in the diffstat. NOTE! This does break two of our tests, so we clearly did this on purpose, or at least tested for it. I just uncommented the subtests that this makes irrelevant, and changed the output of another one. Another test was simply buggy. It used git diff --root cmit, and thought that would be the diff against root. It isn't, and never has been. It just happened to give the same (no file) output before. Fixing --stat to show new files showed how buggy the test was. The --root thing matters for git show or git log (when showing a root commit) and for git diff-tree with a single tree. Maybe we would *want* to make git diff --root cmit be the diff between root and cmit, but that's not what it actually is. Comments? patch.diff Description: Binary data
Re: Fix git diff --stat for interesting - but empty - file changes
On Wed, Oct 17, 2012 at 11:28 AM, Junio C Hamano gits...@pobox.com wrote: I think listing a file whose content remain unchanged with 0 as the number of lines affected makes sense, and it will mesh well with Duy's http://thread.gmane.org/gmane.comp.version-control.git/207749 I first wondered if we would get a division-by-zero while scaling the graph, but we do not scale smaller numbers up to fill the columns, so we should be safe. Note that we should be safe for a totally different - and more fundamental - reason: the zero line case is by no means new. We've always done it for the rename case. These days, we omit 0 insertions and 0 deletions, so I am not sure what you should get for this case, though: Makefile | 0 1 file changed, 0 insertions(+), 0 deletions(-) Should we just say 1 file changed? If that is what it does for the rename case, then yes. I think it should fall out naturally. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner t...@linutronix.de wrote: It is spelled: git notes add -m comment SHA1 Cool! Don't use them for anything global. Use them for local codeflow, but don't expect them to be distributed. It's a separate flow, and while it *can* be distributed, it's not going to be for the kernel, for example. So no, don't start using this to ack things, because the acks *will* get lost. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
On Wed, Oct 24, 2012 at 4:56 AM, Al Viro v...@zeniv.linux.org.uk wrote: How about git commit --allow-empty, with belated ACK for commit Don't bother. It's not that important, and it's just distracting. It's not like this is vital information. If you pushed it out without the ack, it's out without the ack. Big deal. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: first parent, commit graph layout, and pull merge direction
On Thu, May 23, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote: If the proposal were to make pull.rebase the default at a major version bump and force all integrators and other people who are happy with how pull = fetch + merge (not fetch + rebase) works to say pull.rebase = false in their configuration, I think I can see why some people may think it makes sense, though. But neither is an easy sell, I would imagine. It is not about passing me, but about not hurting users like kernel folks we accumulated over 7-8 years. It would be a *horrible* mistake to make rebase the default, because it's so much easier to screw things up that way. That said, making no-ff the default, and then if that fails, saying The pull was not a fast-forward pull, please say if you want to merge or rebase. Use either git pull --rebase git pull --merge You can also use git config pull.merge true or git config pull.rebase true to set this once for this project and forget about it. That way, people who want the existing behavior could just do that git config pull.merge true once, and they'd not even notice. Hmm? Better yet, make it per-branch. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: first parent, commit graph layout, and pull merge direction
On Thu, May 23, 2013 at 5:21 PM, Junio C Hamano gits...@pobox.com wrote: I would assume that no-ff above was meant to be --ff-only from the first part of the message. Yeah, I may need more coffee.. I also would assume that I can rephrase that setting pull.merge (which does not exist) as setting pull.rebase explicitly to false instead (i.e. missing pull.rebase and pull.rebase that is explicitly set to false would mean two different things). Yeah, sounds good to me, and doesn't really sound like it would confuse/annoy anybody as long as it was clearly documented. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New feature discussion: git rebase --status
On Tue, Jun 11, 2013 at 10:18 AM, Hilco Wijbenga hilco.wijbe...@gmail.com wrote: Having git status display (even more) context sensitive information during git rebase or git merge would be very welcome. Please, if at all possible, don't make that a separate command. I agree. The rebase state etc is something that would be much better in git status output, and would avoid having people learn about another new flag to random commands. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] build: get rid of the notion of a git library
On Tue, Jun 11, 2013 at 11:06 AM, Felipe Contreras felipe.contre...@gmail.com wrote: Moreover, if you are going to argue that we shouldn't be closing the door [...] Felipe, you saying if you are going to argue ... to anybody else is kind of ironic. Why is it every thread I see you in, you're being a dick and arguing for some theoretical thing that nobody else cares about? This whole thread has been one long argument about totally pointless things that wouldn't improve anything one way or the other. It's bikeshedding of the worst kind. Just let it go. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace
On Tue, Nov 13, 2012 at 11:15 AM, René Scharfe rene.scha...@lsrfire.ath.cx wrote: Linus, do you remember if you left them out on purpose? Umm, no. I have to wonder why you care? As far as I'm concerned, the only valid space is space, TAB and CR/LF. Anything else is *noise*, not space. What's the reason for even caring? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace
On Tue, Nov 13, 2012 at 11:40 AM, Linus Torvalds torva...@linux-foundation.org wrote: I have to wonder why you care? As far as I'm concerned, the only valid space is space, TAB and CR/LF. Anything else is *noise*, not space. What's the reason for even caring? Btw, expanding the whitespace selection may actually be very counter-productive. It is used primarily for things like removing extraneous space at the end of lines etc, and for that, the current selection of SPACE, TAB and LF/CR is the right thing to do. Adding things like FF etc - that are *technically* whitespace, but aren't the normal kind of silent whitespace - is potentially going to change things too much. People might *want* a form-feed in their messages, for all we know. So I really object to changing things just because. There's a reason we do our own ctype.c: it avoids the crazy crap. It avoids the idiotic localization issues, and it avoids the ambiguous cases. So just let it be, unless you have some major real reason to actually care about a real-world case. And if you do, please explain it. Don't change things just because. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] fix clang -Wtautological-compare with unsigned enum
On Thu, Jan 17, 2013 at 3:00 AM, John Keeping j...@keeping.me.uk wrote: There's also a warning that triggers with clang 3.2 but not clang trunk, which I think is a legitimate warning - perhaps someone who understands integer type promotion better than me can explain why the code is OK (patch-score is declared as 'int'): builtin/apply.c:1044:47: warning: comparison of constant 18446744073709551615 with expression of type 'int' is always false [-Wtautological-constant-out-of-range-compare] if ((patch-score = strtoul(line, NULL, 10)) == ULONG_MAX) ^ ~ The warning seems to be very very wrong, and implies that clang has some nasty bug in it. Since patch-score is 'int', and UNLONG_MAX is 'unsigned long', the conversion rules for the comparison is that the int result from the assignment is cast to unsigned long. And if you cast (int)-1 to unsigned long, you *do* get ULONG_MAX. That's true regardless of whether long has the same number of bits as int or is bigger. The implicit cast will be done as a sign-extension (unsigned long is not signed, but the source type of 'int' *is* signed, and that is what determines the sign extension on casting). So the is always false is pure and utter crap. clang is wrong, and it is wrong in a way that implies that it actually generates incorrect code. It may well be worth making a clang bug report about this. That said, clang is certainly understandably confused. The code depends on subtle conversion rules and bit patterns, and is clearly very confusingly written. So it would probably be good to rewrite it as unsigned long val = strtoul(line, NULL, 10); if (val == ULONG_MAX) .. patch-score = val; instead. At which point you might as well make the comparison be = INT_MAX instead, since anything bigger than that is going to be bogus. So the git code is probably worth cleaning up, but for git it would be a cleanup. For clang, this implies a major bug and bad code generation. Linus Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] fix clang -Wtautological-compare with unsigned enum
On Fri, Jan 18, 2013 at 9:15 AM, Phil Hord phil.h...@gmail.com wrote: Yes, I can tell by the wording of the error message that you are right and clang has a problem. But the git code it complained about does have a real problem, because the result of signed int a = ULONG_MAX is implementation-defined. Only theoretically. Git won't work on machines that don't have 8-bit bytes anyway, so worrying about the theoretical crazy architectures that aren't two's complement etc isn't something I'd care about. There's a whole class of technically implementation-defined issues in C that simply aren't worth caring for. Yes, the standard is written so that it works on machines that aren't byte-addressable, or EBCDIC or have things like 18-bit words and 36-bit longwords. Or 16-bit int for microcontrollers etc. That doesn't make those implementation-defined issues worth worrying about these days. A compiler writer could in theory make up some idiotic rules that are still valid by the C standard even on modern machines, but such a compiler should simply not be used, and the compiler writer in question should be called out for being an ass-hat. Paper standards are only worth so much. And that so much really isn't very much. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Module fixes, and a virtio block fix.
On Sun, Jan 20, 2013 at 5:32 PM, Rusty Russell ru...@rustcorp.com.au wrote: Due to the delay on git.kernel.org, git request-pull fails. It *looks* like it succeeds, except the warning, but (as we learned last time I screwed up), it doesn't put the branchname because it can't know. I think this should be fixed in modern git versions. And it sure as hell knows the proper tag name, since you *gave* it the name and it used it for generating the actual contents. The fact that some versions then screw that up and re-write the tag-name to something randomly matching that isn't a tag was just a bug. For want of a better solution, I'll now resort to sending pull requests with the anti-social gitolite URL in it, like so: That's even worse, fwiw. It means that the pull request address makes no sense to anybody who doesn't have a kernel.org address, and then I'm forced to just edit things by hand instead to not pollute the kernel changelog history with crap. Junio, didn't git request-pull get fixed so that it *warns* about missing tagnames/branches, but never actually corrupts the pull request? Or did it just get fixed to be a hard error instead of corrupting things? Because this is annoying. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Module fixes, and a virtio block fix.
On Sun, Jan 20, 2013 at 6:00 PM, Junio C Hamano gits...@pobox.com wrote: What you mean by corrupt is not clear to me Some versions would just silently change the actual name you were using. So if you said for-linus, it might change it to linus, just because that branch happened to have the same SHA1 commit ID. That's not right. Other versions would replace the for-linus with **missing-branch** because for-linus hadn't mirrored out yet. That's not right either. Basically, if git request-pull is given a branch/tag name, that is the only valid output (although going from branch-tag *might* be acceptable). The whole verify that it actually exists on the remote side must never *ever* actually change the message itself, it should just cause a warning outside of the message. I can't say from the commit message whether that's the thing that fixed it or not, but at least some people stopped sending me broken pull requests after updating to git. I'm just not sure which of the two different failure cases they happened to have (Rusty seems to have hit both) Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Module fixes, and a virtio block fix.
On Sun, Jan 20, 2013 at 6:57 PM, Rusty Russell ru...@rustcorp.com.au wrote: I'm confused. The default argument is HEAD: what does it know about tag names? Ugh. I actually thought that if you give it the tag name directly (as the end) it will use that. But no. It figures it out with git describe --exact internally. Regardless, if your HEAD is actually tagged, it *will* have the tag-name in git-request-pull. And it will have it based on your *local* repo, so the fact that it hasn't been mirrored out yet doesn't really matter. git request-pull knows that tag name regardless of mirroring issues. The bug is that if it can't find that commit at the remote end, it still generates a valid-looking request (with a warning at the end), where it guesses you're talking about the master branch. It really shouldn't do that any more, but you seem to have the older version with the bug. At least one of the annoying problems was fixed in the 1.7.11 series, you have 1.7.10. The nice thing about git is that it is *really* easy to upgrade. Just fetch the sources, do make; make install all as a normal user, and you do not need to worry about package management or distro issues or any crap like that. It installs into your $(HOME)/bin, and as long as your PATH has that first, you'll get it. I've long suggested that as the workaround for distros having old versions (some more so than others). Since I use a wrapper script now for your pull requests I can use sed to unscrew it: [alias] for-linus = !check-commits TAGNAME=`git symbolic-ref HEAD | cut -d/ -f3`-for-linus git tag -f -u D1ADB8F1 $TAGNAME HEAD git push korg tag $TAGNAME git request-pull master korg | sed s,gitol...@ra.kernel.org:/pub,git://git.kernel.org/pub, git log --stat --reverse master..$TAGNAME | emails-from-log | grep -v 'rusty@rustcorp' | grep -v 'sta...@kernel.org' | sed 's/^/Cc: /' Heh. Ok. That will at least hide the breakage. But I suspect you could fix it by just updating git. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: unneeded merge in the security tree
[ Added Junio and git to the recipients, and leaving a lot of stuff quoted due to that... ] On Mon, Mar 11, 2013 at 9:16 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Mar 12, 2013 at 03:10:53PM +1100, James Morris wrote: On Tue, 12 Mar 2013, Stephen Rothwell wrote: The top commit in the security tree today is a merge of v3.9-rc2. This is a completely unnecessary merge as the tree before the merge was a subset of v3.9-rc1 and so if the merge had been done using anything but the tag, it would have just been a fast forward. I know that this is now deliberate behaviour on git's behalf, but isn't there some way we can make this easier on maintainers who are just really just trying to pick a new starting point for their trees after a release? (at least I assume that is what James was trying to do) Yes, and I was merging to a tag as required by Linus. Now, quite frankly, I'd prefer people not merge -rc tags either, just real releases. -rc tags are certainly *much* better than merging random daily stuff, but the basic rule should be don't back-merge AT ALL rather than back-merge tags. That said, you didn't really want a merge at all, you just wanted to sync up and start development. Which is different (but should still prefer real releases, and only use rc tags if it's fixing stuff that happened in the merge window - which may be the case here). Why not just force the head of the security tree to be v3.9-rc2? Then you don't end up creating a completely unnecessary merge commit, and users who were at the previous head of the security tree will experience a fast forward when they pull your new head. So I think that may *technically* be the right solution, but it's a rather annoying UI issue, partly because you can't just do it in a single operation (you can't do a pull of the tag to both fetch and fast-forward it), but partly because git reset --hard is also an operation that can lose history, so it's something that people should be nervous about, and shouldn't use as some kind of standard let's just fast-forward to Linus' tree thing. At the same time, it's absolutely true that when *I* pull a signed tag from a downstream developer, I don't want a fast-forward, because then I'd lose the signature. So when a maintainer pulls a submaintainer tree, you want the signature to come upstream, but when a submaintainer wants to just sync up with upstream, you don't want to generate the pointless signed merge commit, because the signature is already upstream because it's a public tag. So gthe behavior of git pull is fundamentally ambiguous. But git doesn't know the difference between official public upstream tag and signed tag used to verify the pull request. I'm adding the git list just to get this issue out there and see if people have any ideas. I've got a couple of workarounds, but they aren't wonderful.. One is simple: git config alias.sync=pull --ff-only which works fine, but forces submaintainers to be careful when doing things like this, and using a special command to do back-merges. And maybe that's the right thing to do? Back-merges *are* special, after all. But the above alias is particularly fragile, in that there's both pull and merge that people want to use this for, and it doesn't really handle both. And --ff-only will obviously fail if you actually have some work in your tree, and want to do a real merge, so then you have to do that differently. So I'm mentioning this as a better model than git reset, but not really a *solution*. That said, the fact that --ff-only errors out if you have local development may actually be a big bonus - because you really shouldn't do merges at all if you have local development, but syncing up to my tree if you don't have it (and are going to start it) may be something reasonable. Now, the other approach - and perhaps preferable, but requiring actual changes to git itself - is to do the non-fast-forward merge *only* for FETCH_HEAD, which already has magic semantics in other ways. So if somebody does git fetch linus git merge v3.8 to sync with me, they would *not* get a merge commit with a signature, just a fast-forward. But if you do git pull linus v3.8 or a git fetch linus v3.8 git merge FETCH_HEAD it would look like a maintainer merge and stash the signature in the merge commit rather than fast-forward. It would probably work in practice. The final approach might be to make it like the merge summary and simply make it configurable _and_ have a command line flag for it, defaulting to our current behavior or to the above suggested default on for FETCH_HEAD, off for anything else. Hmm? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: unneeded merge in the security tree
On Tue, Mar 12, 2013 at 2:20 PM, Theodore Ts'o ty...@mit.edu wrote: What if we added the ability to do something like this: [remote origin] url = git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git fetch = +refs/heads/master:refs/heads/master mergeoptions = --ff-only Hmm. Something like this could be interesting for other things: - use --rebase when pulling (this is common for people who maintain a set of patches and do *not* export their git tree - I use it for projects like git and subsurface where there is an upstream maintainer and I usually send patches by email rather than git) - --no-summary. As a maintainer, you people probably do want to enable summaries for people they pull from, but *not* from upstream. So this might even make sense to do by default when you clone a new repository. - I do think that we might want a --no-signatures for the specific case of merging signed tags without actually taking the signature (because it's a upstream repo). The --ff-only thing is *too* strict. Sometimes you really do want to merge in new code, disallowing it entirely is tough. Of course, I'm not really sure if we want to list the flags. Maybe it's better to just introduce the notion of upstream directly, and make that a flag, and make origin default to that when you clone. And then have git use different heurstics for pulling upstream (like warning by default when doing a back-merge, perhaps?) Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: unneeded merge in the security tree
On Tue, Mar 12, 2013 at 2:47 PM, Junio C Hamano gits...@pobox.com wrote: I agree that --ff-only thing is too strict and sometimes you would want to allow back-merges, but when you do allow such a back-merge, is there a reason you want it to be --no-signatures merge? When a subtree maintainer decides to merge a stable release point from you with a good reason, I do not see anything wrong in recording that the resulting commit _did_ merge what you released with a signature. No, there's nothing really bad with adding the signature to the merge commit if you do make a merge. It's the fact that it currently makes a non-ff merge when that is pointless that hurts. That said, adding the signature from an upstream tag doesn't really seem to be hugely useful. I'm not seeing much of an upside, in other words. I'd *expect* that people would pick up upstream tags regardless, no? Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git status takes 30 seconds on Windows 7. Why?
On Wed, Mar 27, 2013 at 12:04 PM, Jeff King p...@peff.net wrote: Yes, I think that's pretty much the case (though most of my Git-on-Windows experience is from cygwin long ago, where the stat performance was truly horrendous). Have you tried setting core.preloadindex, which should run the stats in parallel? I wonder if preloadindex shouldn't be enabled by default.. It's a huge deal on NFS, and the only real downside is that it expects threading to work. It potentially slows things down a tiny bit for single-CPU cases with everything cached, but that isn't likely to be a relevant case. Of course, it can trigger filesystem scalability issues, and as a result it will often not help very much if you have the bulk of your files in one (or a few) directories. But anybody who has so many files that performance is an issue is not likely to have them all in one place. And apparently the Windows FS metadata caching sucks, and things fall out of the cache for large trees. Color me not-very-surprised. It's probably some size limit on the metadata that you can tweak. So I';m sure there's some registry setting or other that would make windows able to cache more than a few thousand filenames, and it would probably improve performance a lot, but I do think preloadindex has been around long enough that it could just be the default. Of course, Jim should verify that preloadindex actually does solve his problem. With 20k+ files, it should max out the 20 IO threads for preloading, and assuming the filesystem IO scales reasonably well, it should fix the problem. But we do do a number of metadata ops synchronously even with preloadindex, so things won't scale perfectly. (In particular: do open each directory and do the readdir stuff and try to open .gitignore whether it exists or not. So you'll get synchronous IO for each directory, but at least the per-file IO to check all the file stat data should scale). Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git status takes 30 seconds on Windows 7. Why?
On Wed, Mar 27, 2013 at 1:00 PM, Junio C Hamano gits...@pobox.com wrote: Given that we haven't tweaked the parallelism or thread-cost parameters since the inception of the mechanism in Nov 2008, I suspect that we would see praises from some and grievances from other corners of the user base for a while until we find acceptable values for them Looking at the parameters again, I really think they are pretty sane, and I don't think the numbers are all that likely to have shifted from 2008. The maximum thread value is quite reasonable: twenty threads is sufficient to cover quite a bit of latency, and brings several seconds down to under half a second for any truly IO-limited load, while not being disastrous for the case where everything is in cache and we only have a limited number of CPU cores. And the at least 500 files per thread limit is eminently reasonable too - smaller projects like git won't have more than five or so threads. So I'd be very surprised if the values need much tweaking. Sure, there might be some extreme cases that might tune for some particular patterns, and maybe we should make the values be tunable rather than totally hardcoded, but I suspect there's limited up-side. It might be interesting for the people who really like tuning, though. So in addition to index.preload=true, maybe an extended config format like index_preload=50,200 to say maximum of fifty threads, for every 200 files could be done just so people could play around with the numbers and see how much (if at all) they actually matter. But I really don't think the original 20/500 rule is likely to be all that bad for anybody. Unless there is some *really* sucky thread library out there (ie fully user-space threads, so filename lookup isn't actually parallelised at all), but at least for that case the fix is to just say ok, your threads aren't real threads, so just disable index preloading entirely). Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
On Thu, 14 Apr 2005, David Woodhouse wrote: I've been looking at tracking file revisions. One proposed solution was to have a separate revision history for individual files, with a new kind of 'filecommit' object which parallels the existing 'commit', referencing a blob instead of a tree. Then trees would reference such objects instead of referencing blobs directly. Please don't. It's fundamentally the git notion of content determines objects. It also has no relevance. A rename really doesn't exist in the git model. The git model really is about tracking data, not about tracking what happened to _create_ that data. The one exception is the commit log. That's where you put the explanations of _why_ the data changed. And git itself doesn't care what the format is, apart from the git header. So, you really need to think of git as a filesystem. You can then implement an SCM _on_top_of_it_, which means that your second suggestion is not only acceptable, it really is the _only_ way to handle this in git: So a commit involving a rename would look something like this... tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 rename foo.c bar.c author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 Rename foo.c to bar.c and s/foo_/bar_/g Except I want that empty line in there, and I want it in the free-form section. The rename part really isn't part of the git header. It's not what git tracks, it was tracked by an SCM system on top of git. So the git header is an inode in the git filesystem, and like an inode it has a ctime and an mtime, and pointers to the data. So as far as git is concerned, this part: tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa parent bb95843a5a0f397270819462812735ee29796fb4 author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100 really is the filesystem inode. The rest is whatever the filesystem user puts into it, and git won't care. Opinions? Dissent? We'd probably need to escape the filenames in some way -- handwave over that for now. The fact that git handles arbitrary filenames (stuff starting with . excepted) doesn't mean that the SCM above it needs to. Quite frankly, I think an SCM that handles newlines in filenames is being silly. But a _filesystem_ needs to not care. There are too many messy SCM's out there that do not hav ea philosophy. Dammit, I'm not interested in creating another one. This thing has a mental model, and we keep to that model. The reason UNIX is beautiful is that it has a mental model of processes and files. Git has a mental model of objects and certain very very limited relationships. The relationships git cares about are encoded in the C files, the extra crap (like rename info) is just that - stuff that random scripts wrote, and that is just informational and not central to the model. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Daniel Barkalow wrote: Is there some reason you don't commit before merging? All of the current merge theory seems to want to merge two commits, using the information git keeps about them. Note that the 3-way merge would _only_ merge the committed state. The thing is, 99% of all merges end up touching files that I never touch myself (ie other architectures), so me being able to merge them even when _I_ am in the middle of something is a good thing. So even when I have dirty state, the merge would only merge the clean state. And then before the merge information is put back into my working directory, I'd do a check-files on the result, making sure that nothing that got changed by the merge isn't up-to-date. How much do you care about the situation where there is no best common ancestor I care. Even if the best common parent is 3 months ago, I care. I'd much rather get a big explicit conflict than a clean merge that ends up being debatable because people played games with per-file merging or something questionable like that. I think that the time spent on I/O will be overwhelmed by the time spent issuing the command at that rate. There is no time at all spent on IO. All my email is local, and if this all ends up working out well, I can track the other peoples object trees in local subdirectories with some daily rsyncs. And I have enough memory in my machines that there is basically no disk IO - the only tree I normally touch is the kernel trees, they all stay in cache. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Add clone support to lntree
On Sat, 16 Apr 2005, Petr Baudis wrote: I'm wondering, whether each tree should be fixed to a certain branch. I'm wondering why you talk about branches at all. No such thing should exist. There are no branches. There are just repositories. You can track somebody elses repository, but you should track it by location, not by any branch name. And you track it by just merging it. Yeah, we don't have really usable merges yet, but.. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/2] merge-trees script for Linus git
On Fri, 15 Apr 2005, Junio C Hamano wrote: I'd take the hint, but I would say the current Perl version would be far more usable than the C version I would come up with by the end of this weekend because: Actually, it turns out that I have a cunning plan. I'm full of cunning plans, in fact. It turns out that I can do merges even more simply, if I just allow the notion of state into an index entry, and allow multiple index entries with the same name as long as they differ in state. And that means that I can do all the merging in the regular index tree, using very simple rules. Let's see how that works out. I'm writing the code now. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/2] merge-trees script for Linus git
On Sat, 16 Apr 2005, Junio C Hamano wrote: LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial LT merges, but I ended up deciding that only the matches in all three LT states thing collapses by default. * Understood and agreed. Having slept on it, I think I'll merge all the trivial cases that don't involve a file going away or being added. Ie if the file is in all three trees, but it's the same in two of them, we know what to do. That way we'll leave thigns where the tree itself changed (files added or removed at any point) and/or cases where you actually need a 3-way merge. The userland merge policies need ways to extract the stage information and manipulate them. Am I correct to say that you mean by ls-files -l the extracting part? No, I meant show-files, since we need to show the index, not a tree (no valid tree can ever have the modes information, since (a) it doesn't have the space for it anyway and (b) we refuse to write out a dirty index file. LT I should make ls-files have a -l format, which shows the LT index and the mode for each file too. You probably meant ls-tree. You used the word mode but it already shows the mode so I take it to mean stage. Perhaps something like this? $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b 100644blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:1 man/frotz.6 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:2 man/frotz.6 100664blobeeed997e557fb079f38961354473113ca0d0b115:3 man/frotz.6 Apart from the fact that it would be show-files -l since there are no tree objects that can have anything but fully merged state, yes. Assuming that you would be working on that, I'd like to take the dircache manipulation part. Let's think about the minimally necessary set of operations: * The merge policy decides to take one of the existing stage. In this case we need a way to register a known mode/sha1 at a path. We already have this as update-cache --cacheinfo. We just need to make sure that when update-cache puts things at stage 0 it clears other stages as well. * The merge policy comes up with a desired blob somewhere on the filesystem (perhaps by running an external merge program). It wants to register it as the result of the merge. We could do this today by first storing the desired blob in a temporary file somewhere in the path the dircache controls, update-cache --add the temporary file, ls-tree to find its mode/sha1, update-cache --remove the temporary file and finally update-cache --cacheinfo the mode/sha1. This is workable but clumsy. How about: $ update-cache --graft [--add] desired-blob path to say I want to register mode/sha1 from desired-blob, which may not be of verify_path() satisfying name, at path in the dircache? * The merge policy decides to delete the path. We could do this today by first stashing away the file at the path if it exists, update-cache --remove it, and restore if necessary. This is again workable but clumsy. How about: $ update-cache --force-remove path to mean I want to remove the path from dircache even though it may exist in my working tree? Yes. Am I on the right track? Exactly. You might want to go even lower level by letting them say something like: * update-cache --register-stage mode sha1 stage path Registers the mode/sha1 at stage for path. Does not look at the working tree. stage is [0-3] I'd prefer not. I'd avoid playing games with the stages at any other level than the full tree level until we show a real need for it. Let's go with the known-needed minimal cases that are high-level enough to make the scripting simple, and see if there is any reason to ever touch the tree any other way. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/2] merge-trees script for Linus git
On Sat, 16 Apr 2005, Linus Torvalds wrote: Having slept on it, I think I'll merge all the trivial cases that don't involve a file going away or being added. Ie if the file is in all three trees, but it's the same in two of them, we know what to do. Junio, I pushed this out, along with the two patches from you. It's still more anal than my original tree-diff algorithm, in that it refuses to touch anything where the name isn't the same in all three versions (original, new1 and new2), but now it does the if two of them match, just select the result directly trivial merges. I really cannot see any sane case where user policy might dictate doing anything else, but if somebody can come up with an argument for a merge algorithm that wouldn't do what that trivial merge does, we can make a flag for don't merge at all. The reason I do want to merge at all in read-tree is that I want to avoid having to write out a huge index-file (it's 1.6MB on the kernel, so if you don't do _any_ trivial merges, it would be 4.8MB after reading three trees) and then having people read it and parse it just to do stuff that is obvious. Touching 5MB of data isn't cheap, even if you don't do a whole lot to it. Anyway, with the modified read-tree, as far as I can tell it will now merge all the cases where one side has done something to a file, and the other side has left it alone (or where both sides have done the exact same modification). That should _really_ cut down the cases to just a few files for most of the kernel merges I can think of. Does it do the right thing for your tests? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Ingo Molnar wrote: i've converted the Linux kernel CVS tree into 'flat patchset' format, which gave a series of 28237 separate patches. (Each patch represents a changeset, in the order they were applied. I've used the cvsps utility.) the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a script that will apply all the patches in order and will create a pristine 2.6.12-rc2 tree. Hey, that's great. I got the CVS repo too, and I was looking at it, but the more I looked at it, the more I felt that the main reason I want to import it into git ends up being to validate that my size estimates are at all realistic. I see that Thomas Gleixner seems to have done that already, and come to a figure of 3.2GB for the last three years, which I'm very happy with, mainly because it seems to match my estimates to a tee. Which means that I just feel that much more confident about git actually being able to handle the kernel long-term, and not just as a stop-gap measure. But I wonder if we actually want to actually populate the whole history.. Now that my size estimates have been verified, I have little actual real reason to put the history into git. There are no visualization tools done for git yet, and no helpers to actually find problems, and by the time there will be, we'll have new history. So I'd _almost_ suggest just starting from a clean slate after all. Keeping the old history around, of course, but not necessarily putting it into git now. It would just force everybody who is getting used to git in the first place to work with a 3GB archive from day one, rather than getting into it a bit more gradually. What do people think? I'm not so much worried about the data itself: the git architecture is _so_ damn simple that now that the size estimate has been confirmed, that I don't think it would be a problem per se to put 3.2GB into the archive. But it will bog down rsync horribly, so it will actually hurt synchronization untill somebody writes the rev-tree-like stuff to communicate changes more efficiently.. IOW, it smells to me like we don't have the infrastructure to really work with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can build up the infrastructure in parallell with starting to really need it. But it's _great_ to have the history in this format, especially since looking at CVS just reminded me how much I hated it. Comments? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Thomas Gleixner wrote: One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? I'd really rather not. Why don't you just use ls-tree for scripting? That's why it exists in the first place. It might make sense to have some simple selection capabilities built into ls-tree (ie ls-tree --match drivers/char/ -z treesha1 to get just a subtree out), but that depends entirely on how you end up using it. The fact is, there should _never_ any reason to look at the objects themselves directly. cat-file is a debugging aid, it shouldn't be scripted (with the possible exception of cat-file blob to just extract the blob contents, since that object doesn't have any internal structure). That level of abstraction (we never look directly at the objects) is what allows us to change the object structure later. For example, we already changed the commit date thing once, and the tree object has obviously evolved a bit, and if we ever change the hash, the objects will change too, but if you always just script them using nice helper tools, you won't ever need to _care_. And that's how it should be. If there's a tool missing, holler. THAT is the part I've been trying to write: all the plumbing so that you _can_ script the thing sanely, and not worry about how objects are created and worked with. For example, that index file format likely _will_ change. I ended up doing the new stage flags in a way that kept the index file compatible with old ones, but I did that mainly because it also happened to be the easiest way to enforce the rule I wanted to enforce (ie the stage really _is_ a part of the filename from a compare filenames standpoint, in order to make sure that the stages are always ordered). So if the index file change hadn't had that property, I'd have just said I'll change the format, and anybody who tried to parse the index file would have been _broken_. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Thomas Gleixner wrote: For the export stuff its terrible slow. :( I don't really see your point. If you already know what the tree is like you say, you don't care about the tree object. And if you don't know what the tree is, what _are_ you doing? In other words, show us what you're complaining about. If you're looking into the trees yourself, then the binary representation of the sha1 is already what you want. That _is_ the hash. So why do you want it in ASCII? And if you're not looking into the tree directly, but using cat-file tree and you were hoping to see ASCII data, then that's certainly not going to be any faster than just doing ls-tree instead. In other words, I don't see your point. Either you want ascii output for scripting, or you don't. First you claimed that you did, and that you would want the tree object to change in order to do so. Now you claim that you can't use ls-tree because it's too slow. That just isn't making any sense. You're mixing two totally different levels, and complaining about performance when scripting things. Yet you're talking about a 20-byte data structure that is trivial to convert to any format you want. What kind of _strange_ scripting architecture is so fast that there's a difference between cat-file and ls-tree and can handle 17,000 files in 60,000 revisions, yet so slow that you can't trivially convert 20 bytes of data? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-cache --refresh cache entry leak
On Sat, 16 Apr 2005, Junio C Hamano wrote: When update-cache --refresh replaces an existing cache entry with a new one, it forgets to free the original. I've seen this patch now three times, and it's been wrong every single time. Maybe we should add a comment? That active-cache entry you free()'d was not necessarily allocated with malloc(). Most cache-entries are just mmap'ed directly from the index file. Leaking is ok. We cannot leak too much. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use libcurl to use HTTP to get repositories
On Sat, 16 Apr 2005, Paul Jackson wrote: Daniel wrote: I'm working off of Linus's tree when not working on scripts, and it doesn't have that section at all. Ah so - nevermind my README comments then. Well, actually, I suspect that something like this should go to Pasky. I really see my repo as purely a internal git datastructures, and when it gets to how do we interact with other peoples web-sites, I suspect Pasky's tree is better. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
On Sat, 16 Apr 2005, Paul Jackson wrote: Morten wrote: It makes some sense in principle, but without storing what they mean (i.e., group==?) it certainly makes no sense. There's no they there. I think Martin's proposal, to which I agreed, was to store a _single_ bit. If any of the execute permissions of the incoming file are set, then the bit is stored ON, else it is stored OFF. On 'checkout', if the bit is ON, then the file permission is set mode 0777 (modulo umask), else it is set mode 0666 (modulo umask). I think I agree. Anybody willing to send me a patch? One issue is that if done the obvious way it's an incompatible change, and old tree objects won't be valid any more. It might be ok to just change the compare cache check to only care about a few bits, though: S_IXUSR and S_IFDIR. And then always write new tree objects out with mode set to one of - 04: we already do this for directories - 100644: normal files without S_IXUSR set - 100755: normal files _with_ S_IXUSR set Then, at compare time, we only look at S_IXUSR matching for files (we never compare directory modes anyway). And at file create time, we create them with 0666 and 0777 respectively, and let the users umask sort it out (and if the user has 0100 set in his umask, he can damn well blame himself). This would pretty much match the existing kernel tree, for example. We'd end up with some new trees there (and in git), but not a lot of incompatibility. And old trees would still work fine, they'd just get written out differently. Anybody want to send a patch to do this? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issues with higher-order stages in dircache
On Sat, 16 Apr 2005, Junio C Hamano wrote: I am wondering if you have a particular reason not to do the same for the removing half. No. Except for me being silly. Please just make it so. Also do you have any comments on this one from the same message? * read-tree - When merging two trees, i.e. read-tree -m A B, shouldn't we collapse identical stage-1/2 into stage-0? How do you actually intend to merge two trees? That sounds like a total special case, and better done with diff-tree. But regardless, since I assume the result is the later tree, why do a read-tree -m A B, since what you really want is read-tree B? The real merge always needs the base tree, and I'd hate to complicate the real merge with some special-case that isn't relevant for that real case. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
On Sat, 16 Apr 2005, Linus Torvalds wrote: Anybody want to send a patch to do this? Actually, I just did it. Seems to work for the only test-case I tried, namely I just committed it, and checked that the permissions all ended up being recorded as 0644 in the tree (if it has the -x bit set, they get recorded as 0755). When checking out, we always check out with 0666 or 0777, and just let umask do its thing. We only test bit 0100 when checking for differences. Maybe I missed some case, but this does indeed seem saner than the try to restore all bits case. If somebody sees any problems, please holler. (Btw, you may or may not need to blow away your index file by just re-creating it with a read-tree after you've updated to this. I _tried_ to make sure that the compare just ignored the ce_mode bits, but the fact is, your index file may be corrupt in the sense that it has permission sets that sparse expects to never generate in an index file any more..) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Parseable commit header
On Sun, 17 Apr 2005, Stefan-W. Hahn wrote: after playing a while with git-pasky it is a crap to interpret the date of commit logs. Though it was a good idea to put the date in a parseable format (seconds since), but the format of the commit itself is not good parseable. Actually, it is. The commit stuff removes all special characters from the strings, so '' and '' around the email do indeed act as delimiters, and cannot exist anywhere else. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
On Sun, 17 Apr 2005, David A. Wheeler wrote: There's a minor reason to write out ALL the perm bit data, but only care about a few bits coming back in: Some people use SCM systems as a generalized backup system Yes. I was actually thinking about having system config files in a git repository when I started it, since I noticed how nicely it would do exactly that. However, since the mode bits also end up being part of the name of the tree object (ie they are most certainly part of the hash), it's really basically impossible to only care about one bit but writing out many bits: it's the same issue of having multiple identical blocks with different names. It's ok if it happens occasionally (it _will_ happen at the point of a tree conversion to the new format, for example), but it's not ok if it happens all the time - which it would, since some people have umask 002 (and individual groups) and others have umask 022 (and shared groups), and I can imagine that some anal people have umask 0077 (I don't want to play with others). The trees would constantly bounce between a million different combinations (since _some_ files would be checked out with the other mode). At least if you always honor umask or always totally ignore umask, you get a nice repetable thing. We tried the always ignore umask thing, and the problem with that is that while _git_ ended up always doing a fchmod() to reset the whole permission mask, anybody who created files any other way and then checked them in would end up using umask. One solution is to tell git with a command line flag and/or config file entry that for this repo, I want you to honor all bits. That should be easy enough to add at some point, and then you really get what you want. That said, git won't be really good at doing system backup. I actually _do_ save a full 32-bit of mode (hey, you could have immutable bits etc set), but anybody who does anything fancy at all with mtime would be screwed, for example. Also, right now we don't actually save any other type of file than regular/directory, so you'd have to come up with a good save-format for symlinks (easy, I guess - just make a link blob) and device nodes (that one probably should be saved in the cache_entry itself, possibly encoded where the sha1 hash normally is). Also, I made a design decision that git only cares about non-dotfiles. Git literally never sees or looks at _anything_ that starts with a .. I think that's absolutely the right thing to do for an SCM (if you hide your files, I really don't think you should expect the SCM to see it), but it's obviously not the right thing for a backup thing. (It _might_ be the right thing for a system config file, though, eg tracking something like /etc with git might be ok, modulo the other issues). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using git directory cache code in darcs?
On Sun, 17 Apr 2005, David Roundy wrote: That's all right. Darcs would only access the cached data through a git-caching layer, and we've already got an abstraction layer over the pristine cache. As long as the git layer can quickly retrieve the contents of a given file, we should be fine. Yes. In fact, one of my hopes was that other SCM's could just use the git plumbing. But then I'd really suggest that you use git itself, not any libgit. Ie you take _all_ the plumbing as real programs, and instead of trying to link against individual routines, you'd _script_ it. In other words, git would be an independent cache of the real SCM, and/or the old history (ie an SCM that uses git could decide that the git stuff is fine for archival, and really use git as the base: and then the SCM could entirely concentrate on _only_ the interesting parts, ie the actual merging etc). That was really what I always personally saw git as, just the plumbing beneath the surface. For example, something like arch, which is based on patches and tar-balls (I think darcs is similar in that respect), could use git as a _hell_ of a better history of tar-balls. The thing is, unless you take the git object database approach, using _just_ the index part doesn't really mean all that much. Sure, you could just keep the current objects in the object database, but quite frankly, there would probably not be a whole lot of point to that. You'd waste so much time pruning and synchronizing with your real database that I suspect you'd be better off not using it. (Or you could prune nightly or something, I guess). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Sun, 17 Apr 2005, Russell King wrote: On Sat, Apr 16, 2005 at 04:01:45PM -0700, Linus Torvalds wrote: So I re-created the dang thing (hey, it takes just a few minutes), and pushed it out, and there's now an archive on kernel.org in my public personal directory called linux-2.6.git. I'll continue the tradition of naming git-archive directories as *.git, since that really ends up being the .git directory for the checked-out thing. We need to work out how we're going to manage to get our git changes to you. At the moment, I've very little idea how to do that. Ideas? To me, merging is my highest priority. I suspect that once I have a tree from you (or anybody else) that I actually _test_ merging with, I'll be motivated as hell to make sure that my plumbing actually works. After all, it's not just you who want to have to avoid the pain of merging: it's definitely in my own best interests to make merging as easy as possible. You're _the_ most obvious initial candidate, because your merges almost never have any conflicts at all, even on a file level (much less within a file). However, I've made a start to generate the necessary emails. How about this format? I'm not keen on the tree, parent, author and committer objects appearing in this - they appear to clutter it up. What're your thoughts? Indeed. I'd almost drop the whole header except for the author line. Oh, and you need a separator between commits, right now your Signed-off-by: line ends up butting up with the header of the next commit ;) I'd rather not have the FQDN of the machine where the commit happened appearing in the logs. That's fine. Out short-logs have always tried to have just the real name in them, and I do want an email-like thing for tracking the developer, but yes, if you remove the email, that's fine. It should be easy enough to do with a simple sed 's/.*//' or similar. And if you replace author with From: and do the date conversion, it might look more natural. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Sun, 17 Apr 2005, Russell King wrote: BTW, there appears to be errors in the history committed thus far. I'm not sure where this came from though. Some of them could be UTF8 vs ASCII issues, but there's a number which seem to have extra random crap in them (^M) and lots of blank lines). Ah, yes. That is actually from the original emails from Andrew. I do not know why, but I see them there. It's his script that does something strange. (Andrew: in case you care, the first one is [patch 003/198] arm: fix SIGBUS handling which has the email looking like ... From: [EMAIL PROTECTED] Date: Tue, 12 Apr 2005 03:30:35 -0700 Status: X-Status: X-Keywords: ^M) From: Russell King [EMAIL PROTECTED] ARM wasn't raising a SIGBUS with a siginfo structure. Fix __do_user_fault() to allow us to use it for SIGBUS conditions, and arrange for the sigbus path to use this. ... One thing which definitely needs to be considered is - what character encoding are the comments to be stored as? To git, it's just a byte stream, and you can have binary comments if you want to. I personally would prefer to move towards UTF eventually, but I really don't think it matters a whole lot as long as 99.9% of everything we'd see there is still 7-bit ascii. ID: 75f86bac962b7609b0f3c21d25e10647ff8ed280 [PATCH] intel8x0: AC'97 audio patch for Intel ESB2 This patch adds the Intel ESB2 DID's to the intel8x0.c file for AC'97 audio support. Signed-off-by: A0Jason Gaston [EMAIL PROTECTED] That A0 is also there in Andrew's original email. It's space with the high bit set, and I have no idea why. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] fork optional branch point normazilation
On Sun, 17 Apr 2005, Brad Roberts wrote: (ok, author looks better, but committer doesn't obey the AUTHOR_ vars yet) They should't, but maybe I should add COMMITTER_xxx overrides. I just do _not_ want people to think that they should claim to be somebody else: it's not a security issue (you could compile your own commit-tree.c after all), it's more of a social rule thing. I prefer seeing bad email addresses that at least match the system setup to seeing good email addresses that people made up just to make them look clean. Mind showing what your /etc/passwd file looks like (just your own entry, and please just remove your password entry if you don't use shadow passwords). Maybe I should just remove _all_ strange characters when I do the name cleanup in commit. Right now I just remove the ones that matter to parsing it unambiguosly: '\n' '' and ''. (The ',' character really is special: some people have Torvalds, Linus and maybe I should not just remove the commas, I should convert it to always be Linus Torvalds. But your gecos entry is just _strange_. Why the extra commas, I wonder?) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/5] Parsing code in revision.h
On Sun, 17 Apr 2005, Daniel Barkalow wrote: --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83) +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab) @@ -24,6 +24,7 @@ unsigned int flags; unsigned char sha1[20]; unsigned long date; + unsigned char tree[20]; struct parent *parent; }; I think this is really wrong. The whole point of revision.h is that it's a generic framework for keeping track of relationships between different objects. And those objects are in no way just commit objects. For example, fsck uses this struct revision to create a full free of _all_ the object dependencies, which means that a struct revision can be any object at all - it's not in any way limited to commit objects, and there is no tree object that is associated with these things at all. Besides, why do you want the tree? There's really nothing you can do with the tree to a first approximation - you need to _first_ do the reachability analysis entirely on the commit dependencies, and then when you've selected a set of commits, you can just output those. Later phases will indeed look up what the tree is, but that's only after you've decided on the commit object. There's no point in looking up (or even trying to just remember) _all_ the tree objects. Hmm? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Sun, 17 Apr 2005, Russell King wrote: This will (and does) do exactly what I want. I'll also read into the above a request that you want it in forward date order. 8) No, I actually don't _think_ I care. In many ways I'm more used to reverse date order, because that's usually how you view a changelog (with a pager, and most recent changes at the top). Which one makes sense when asking me to merge? I don't know, and I don't think it really even matters, but maybe we can add a for now to whatever decision you end up coming to? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
First ever real kernel git merge!
It may not be pretty, but it seems to have worked fine! Here's my history log (with intermediate checking removed - I was being pretty anal ;): rsync -avz --ignore-existing master.kernel.org:/home/rmk/linux-2.6-rmk.git/ .git/ rsync -avz --ignore-existing master.kernel.org:/home/rmk/linux-2.6-rmk.git/HEAD .git/MERGE-HEAD merge-base $(cat .git/HEAD) $(cat .git/MERGE-HEAD) for i in e7905b2f22eb5d5308c9122b9c06c2d02473dd4f $(cat .git/HEAD) $(cat .git/MERGE-HEAD); do cat-file commit $i | head -1; done read-tree -m cf9fd295d3048cd84c65d5e1a5a6b606bf4fddc6 9c78e08d12ae8189f3bd5e03accc39e3f08e45c9 a43c4447b2edc9fb01a6369f10c1165de4494c88 write-tree commit-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6 -p $(cat .git/HEAD) -p $(cat .git/MERGE-HEAD) echo 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a .git/HEAD fsck-cache --unreachable 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a which looks really messy, because I really wanted to do each step slowly by hand, so those magic revision numbers are just cut-and-pasted from the results that all the previous stages had printed out. NOTE! As expected, this merge had absolutely zero file-level clashes, which is why I could just do the read-tree -m followed by a write-tree. But it's a real merge: I had some extra commits in my tree that were not in Russell's tree, and obviously vice versa. Also note! The end result is not actually written back to the corrent working directory, so to see what the merge result actually is, there's another final phase: read-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6 update-cache --refresh checkout-cache -f -a which just updates the current working directory to the results. I'm _not_ caring about old dirty state for now - the theory was to get this thing working first, and worry about making it nice to use later. A second note: a real merge thing should notice that if the merge-base output ends up being one of the inputs (it one side is a strict subset of the other side), then the merge itself should never be done, and the script should just update directly to which-ever is non-common HEAD. But as far as I can tell, this really did work out correctly and 100% according to plan. As a result, if you update to my current tree, the top-of-tree commit should be: cat-file commit $(cat .git/HEAD) tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6 parent 8173055926cdb8534fbaed517a792bd45aed8377 parent df4449813c900973841d0fa5a9e9bc7186956e1e author Linus Torvalds [EMAIL PROTECTED] 111377 -0700 committer Linus Torvalds [EMAIL PROTECTED] 111377 -0700 Merge with master.kernel.org:/home/rmk/linux-2.6-rmk.git - ARM changes First ever true git merge. Let's see if it actually works. Yehaa! It did take basically zero time, btw. Except for my bunbling about, and the first rsync the objects from rmk's directory part (which wasn't horrible, it just wasn't instantaneous like the other phases). Btw, to see the output, you really want to have a git log that sorts by date. I had an old gitlog.sh that did the old recursive thing, and while it shows the right thing, the ordering ended up making it be very non-obvious that rmk's changes had been added recently, since they ended up being at the very bottom. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Sun, 17 Apr 2005, Russell King wrote: I pulled it tonight into a pristine tree (which of course worked.) Goodie. In doing so, I noticed that I'd messed up one of the commits - there's a missing new file. Grr. I'll put that down to being a newbie git. Actually, you should put that down to horribly bad interface tools. With BK, we had these nice tools that pointed out that there were files that you might want to commit (ie bk citool), and made this very obvious. Tools absolutely matter. And it will take time for us to build up that kind of helper infrastructure. So being newbie might be part of it, but it's the smaller part, I say. Rough interfaces is a big issue. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] fork optional branch point normazilation
On Sun, 17 Apr 2005, Brad Roberts wrote: braddr:x:1000:1000:Brad Roberts,,,:/home/braddr:/bin/bash All gecos entries on all my debian boxes are of the form: fullname, office number, office extension, and home number Ahh, ok. I'll make the cleanup thing just remove strange characters from the end, that should fix this kind of thing for now. I'd just remove everything after the first strange number, but I can also see people using the lastname, firstname format, and I'd hate to just ignore firstname in that case. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
On Mon, 18 Apr 2005, Herbert Xu wrote: I wasn't disputing that of course. However, the same effect can be achieved in using a single hash with a bigger length, e.g., sha256 or sha512. No it cannot. If somebody actually literally totally breaks that hash, length won't matter. There are (bad) hashes where you can literally edit the content of the file, and make sure that the end result has the same hash. In that case, when the hash algorithm has actually been broken, the length of the hash ends up being not very relevant. For example, you might hash your file by blocking it up in 16-byte blocks, and xoring all blocks together - the result is a 16-byte hash. It's a terrible hash, and obviously trivially breakable, and once broken it does _not_ help to make it use its 32-byte cousin. Not at all. You can just modify the breaking thing to equally cheaply make modifications to a file and get the 32-byte hash right again. Is that kind of breakage likely for sha1? Hell no. Is it possible? In your in theory world where practice doesn't matter, yes. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Russell King wrote: Ok, I just tried pulling your tree into the tree you pulled from, and got this: No, that can't work. The pesky tools are helpful, but they really don't do merges worth cr*p right now, excuse my french. The _real_ way to pull is to do the (horribly complex) thing I described by the merge, but noticing that one of the commits you are merging is a proper subset of the other one, and just updating the head instead of actually doing a real merge (ie skipping the read-tree -m and write-tree phases). This was with some random version of git-pasky-0.04. Unfortunately, this version doesn't have the sha1 ID appended, so I couldn't say definitively that it's the latest and greatest. It might be a day old. I'm afraid that until Pasky's tools script this properly, a pull really ends up being something like this (which _can_ be scripted, never fear): NOTE NOTE NOTE! This is untested! I'm writing this within the email editor, so do _not_ do this on a tree that you care about. #!/bin/sh # # use $1 or something in a real script, this # just hard-codes it. # merge_repo=master.kernel.org:/pub/linux/kernel/people/torvalds/linux-2.6.git echo Getting object database rsync -avz --ignore-existing $merge_repo/ .git/ echo Getting remote head rsync -avz $merge_repo/HEAD .git/MERGE_HEAD head=$(cat .git/HEAD) merge_head=$(cat .git/MERGE-HEAD) common=$(merge-base $head $merge_head) if [ -z $common ]; then echo Unable to find common commit between $merge_head $head exit 1 fi # Get the trees associated with those commits common_tree=tree=$(cat-file commit $common | sed 's/tree //;q') head_tree=tree=$(cat-file commit $head | sed 's/tree //;q') merge_tree=tree=$(cat-file commit $merge | sed 's/tree //;q') if [ $common == $merge_head ]; then echo Already up-to-date. Yeeah! exit 0 fi if [ $common == $head ]; then echo Updating from $head to $merge_head. echo Destroying all noncommitted data! echo Kill me within 3 seconds.. sleep 3 read-tree $merge_tree checkout-cache -f -a echo $merge_head .git/HEAD exit 0 fi echo Trying to merge $merge_head into $head read-tree -m $common_tree $head_tree $merge_tree result_tree=$(write-tree) || exit 1 result_commit=$(echo Merge $merge_repo | commit-tree $result_tree -p $head -p $merge_head) echo Committed merge $result_commit echo $result_commit .git/HEAD read-tree $result_tree checkout-cache -f -a The above looks like it might work, but I also warn you: it's not only untested, but it's pretty fragile in that if something breaks, you are probably left with a mess. I _tried_ to do the right thing, but... So it obviously will need testing, tweaking and just general tender loving care. And if the merge isn't clean, it will exit early thanks to the write-tree || exit 1 and now you have to resolve the merge yourself. There are tools to help you do so automatically, but that's really a separate script. You shouldn't hit the merge case at all right now, you should hit the Updating from $head to $merge_head thing. If Pesky wants to take the above script, test it, and see if it works, that would be good. It's definitely a much better pull than trying to apply the patches forward.. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A couple of questions
On Mon, 18 Apr 2005, Imre Simon wrote: How will git handle a corrupted (git) file system? For instance, what can be done if objects/xy/z{38} does not pass the simple consistency test, i.e. if the file's sha1 hash is not xyz{38}? This might be a serious problem because, in general, one cannot reconstruct the contents of file objects/xy/z{38} from its name xyz{38}. Nothing beats backups and distribution. The distributed nature of git means that you can replicate your objects abitrarily. Another problem might come up if the file does pass the simple consistency test but the file's contents is not a valid git file, Run fsck-cache. It not only tests SHA1 and general object sanity, but it does full tracking of the resulting reachability and everything else. It prints out any corruption it finds (missing or bad objects), and if you use the --unreachable flag it will also print out objects that exist but that aren't readable from any of the HEAD nodes (which you need to specify). So for example fsck-cache --unreachable $(cat .git/HEAD) will do quite a _lot_ of verification on the tree. There are a few extra validity tests I'm going to add (make sure that tree objects are sorted properly etc), but on the whole if fsck-cache is happy, you do have a valid tree. Any corrupt objects you will have to find in backups or other archives (ie you can just remove them and do an rsync with some other site in the hopes that somebody else has the object you have corrupted). Of course, valid tree doesn't mean that it wasn't generated by some evil person, and the end result might be crap. Git is a revision tracking system, not a quality assurance system ;) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Greg KH wrote: On Sun, Apr 17, 2005 at 04:24:24PM -0700, Linus Torvalds wrote: Tools absolutely matter. And it will take time for us to build up that kind of helper infrastructure. So being newbie might be part of it, but it's the smaller part, I say. Rough interfaces is a big issue. Speaking of tools, you had a dotest program to apply patches in email form to a bk tree. And from what I can gather, you've changed that to handle git archives, right? Yup. It's a git archive at kernel.org:/pub/linux/kernel/people/torvalds/git-tools.git and it seems to work. It's what I've used for all the kernel patches (except for the merge), and it's what I use for the git stuff that shows up as authored by others. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Linus Torvalds wrote: No, that can't work. The pesky tools are helpful [...] I'm afraid that until Pasky's tools script this properly, [... ] If Pesky wants to take the above script, test it, [...] Ok, one out of three isn't too bad, is it? Pesky/Pasky, so close yet so far. Sorry, Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [0/5] Parsers for git objects, porting some programs
On Sun, 17 Apr 2005, Daniel Barkalow wrote: This series introduces common parsers for objects, and ports the programs that currently use revision.h to them. 1: the header files 2: the implementations 3: port rev-tree 4: port fsck-cache 5: port merge-base Ok, having now looked at the code, I don't have any objections at all. Could you clarify the fsck issue about reading the same object twice? When does that happen? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix bug in read-cache.c which loses files when merging a tree
On Mon, 18 Apr 2005, James Bottomley wrote: I had a problem with the SCSI tree in that there's a file removal in one branch. Your git-merge-one-file-script wouldn't have handled this correctly: It seems to think that the file must be removed in both branches, which is wrong. Yes, I agree. My current merge-one-file-script doesn't actually look at what the original file was in this situation, and clearly it should. I think I'll leave it for the user to decide what happens when somebody has modified the deleted file, but clearly we should delete it if the other branch has not touched it. I suspect that I should just pass in the SHA1 of the files to the merge-one-file-script from merge-cache, rather than unpacking it. After all, the merging script can do the unpacking itself with a simple cat-file blob $sha1. And the fact is, many of the trivial merges should be handled by just looking at the content, and doing a cmp on the files seems to be a stupid way to do that when we had the sha1 earlier. Done, and pushed out. Does the new merge infrastructure work for you? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Russell King wrote: Since this happened, I've been working out what state my tree is in, and I restored it back to a state where I had one dangling commit head, which was _my_ head. For the future, if your tree gets messed up to the point where you say screw it and just want to go back in time, you can do this (it's equivalent to undo in BK speak): git log | less -S .. find which HEAD it was that you trusted.. In this case your HEAD before I merged with it was this one: df4449813c900973841d0fa5a9e9bc7186956e1e So to get back to that one, you can do echo df4449813c900973841d0fa5a9e9bc7186956e1e .git/HEAD and now cat-file commit $(cat .git/HEAD) | head -1 gives you tree a43c4447b2edc9fb01a6369f10c1165de4494c88 so you can restore your checked-out state with read-tree a43c4447b2edc9fb01a6369f10c1165de4494c88 checkout-cache -f -a update-cache --refresh and your tree should be valid again. Now, to remove any bogus objects, you can then run my git-prune-script (look at it carefully first to make sure you realize what you are doing). NOTE NOTE NOTE! This will _revert_ everything you had done after the trusted point. So you may not actually want to do this. Instead: It's very much like I somehow committed against the _parent_ of the head, rather than the head itself. That's very common if you just forget to update your new .git/HEAD when you do a commit. Again, it's the tools that make it a bit too easy to mess up. The commit-tree thing is supposed to really only be used from scripts (which would do something like result=$(commit-tree ...) echo $result .git/HEAD but when doing things by hand, if you forget to update your HEAD, your next commit will be done against the wrong head, and you get dangling commits. The good news is that this is not that hard to fix up. The _trees_ are all correct, and the objects are all correct, so what you can do is just generate a few new (proper) commit objects, with the right parents. Then you can do the git-prune-script thing that will throw away the old broken commits, since they won't be reachable from your new commits (even though their _trees_ will be there and be the same). So in this case: b4a9a5114b3c6da131a832a8e2cd1941161eb348 +- e7905b2f22eb5d5308c9122b9c06c2d02473dd4f +- dc90c0db0dd5214aca5304fd17ccd741031e5493 -- extra dangling head +- 488faba31f59c5960aabbb2a5877a0f2923937a3 you can do cat-file commit dc90c0db0dd5214aca5304fd17ccd741031e5493 to remind you what your old tree and commit message was, and then just re-commit that tree with the same message but with the proper parent: commit-tree -p 488faba31f59c5960aabbb2a5877a0f2923937a3 and then you need to do the same thing for the other commits (which will now need to be re-based to have the new commit-chain as their parents). Then, when you fixed up the final one, remember to update .git/HEAD with its commit ID, and now the prune-thing will get rid of the old dangling commits that you just created new duplicates of. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix bug in read-cache.c which loses files when merging a tree
On Mon, 18 Apr 2005, Petr Baudis wrote: So, I'm confused. Why did you introduce unpack-file instead of doing just this? It was code that I already had (ie the old code from merge-cache just moved over), and thanks to that, I don't have to worry about broken mktemp crap in user space... Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Greg KH wrote: Hm, have you pushed all of the recent changes public? Oops. Obviously not. Will fix. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Mon, 18 Apr 2005, Greg KH wrote: Anyway, I try it this way and get: You should update to the newest version anyway.. $ dotest ~/linux/patches/usb/usb-visor-tapwave_zodiac.patch Applying USB: visor Tapwave Zodiac support patch fatal: preparing to update file 'drivers/usb/serial/visor.c' not uptodate in cache What did I forget to do? The most common reason is that the scripts _really_ want the index to match your current tree exactly. Run update-cache --refresh. And if you have any uncommitted information, make sure to commit it first. (Not _strictly_ true - you can leave edited files in your directory, and just hope the patch never touches them. The thing you should _not_ do is to do an update-cache .c to commit any changes to the 'index', because then the patch applicator will actually commit that one too). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-done kernel archive - real one?
On Tue, 19 Apr 2005, Petr Baudis wrote: What is actually a little annoying is having to cd ,,merge and then back, though. I don't know, but the current pull-merge script does not bother with the temporary merge directory neither, even though Linus wanted it. Linus, do you still do? ;-) No, now that the merge is done entirely in the index file, I don't care any more. The index file _is_ the temporary directory as far as I'm concerned. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI trees, merges and git status
On Mon, 18 Apr 2005, James Bottomley wrote: It looks like the merge tree has contamination from the scsi-misc-2.6 tree ... possibly because the hosting system got the merged objects when I pushed. Nope, the way I merge, if I get a few objects it shouldn't matter at all. I'll just look at your HEAD, and merge with the objects that represents. Afterwards, if I have extra objects, I'll see them with fsck-cache. Could you strip it back and I'll check out the repos on www.parisc- linux.org? Git does work like BK in the way that you cannot remove history when you have distributed it. Once it's there, it's there. The patches from you I have in my tree are: scsi: add DID_REQUEUE to the error handling zfcp: add point-2-point support [PATCH] Convert i2o to compat_ioctl [PATCH] kill old EH constants [PATCH] scsi: remove meaningless scsi_cmnd-serial_number_at_timeout field [PATCH] scsi: remove unused scsi_cmnd-internal_timeout field [PATCH] remove outdated print_* functions [PATCH] consolidate timeout defintions in scsi.h or at least that's what they claim in their changelogs. Oh, and here's the diffstat that matches scsi: drivers/block/scsi_ioctl.c |5 - drivers/s390/scsi/zfcp_aux.c |4 - drivers/s390/scsi/zfcp_def.h |5 + drivers/s390/scsi/zfcp_erp.c | 20 + drivers/s390/scsi/zfcp_fsf.c | 38 -- drivers/s390/scsi/zfcp_fsf.h |6 + drivers/s390/scsi/zfcp_sysfs_adapter.c |6 + drivers/scsi/53c7xx.c | 23 +++--- drivers/scsi/BusLogic.c|7 - drivers/scsi/NCR5380.c |9 +- drivers/scsi/advansys.c|7 - drivers/scsi/aha152x.c | 17 ++-- drivers/scsi/arm/acornscsi.c |9 +- drivers/scsi/arm/fas216.c |9 +- drivers/scsi/arm/scsi.h|2 drivers/scsi/atari_NCR5380.c |9 +- drivers/scsi/constants.c |2 drivers/scsi/ips.c |7 - drivers/scsi/ncr53c8xx.c | 14 --- drivers/scsi/pci2000.c |4 - drivers/scsi/qla2xxx/qla_dbg.c |6 - drivers/scsi/scsi.c|5 - drivers/scsi/scsi.h| 43 --- drivers/scsi/scsi_error.c | 11 --- drivers/scsi/scsi_ioctl.c |5 - drivers/scsi/scsi_lib.c|2 drivers/scsi/scsi_obsolete.h | 106 - drivers/scsi/scsi_priv.h |5 - drivers/scsi/seagate.c |5 - drivers/scsi/sg.c |3 drivers/scsi/sun3_NCR5380.c|9 +- drivers/scsi/sym53c8xx_2/sym_glue.c|6 - drivers/scsi/ultrastor.c |4 - so it doesn't look like there's a _lot_ wrong. Send in a patch to revert anything that needs reverting.. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI trees, merges and git status
On Mon, 18 Apr 2005, James Bottomley wrote: Then the git-pull... script actually does the merge and the resulting tree checks out against BK So? What do you intend to do with all the other stuff I've already put on top? Yes, I can undo my tree, but my tree has had more stuff in it since I pulled from you, so not only will that confuse everybody who already got the up-to-date tree, it will also undo stuff that was correct. In other words, HISTORY CANNOT BE UNDONE. That's the rule, and it's a damn good one. It was the rule when we used BK, and it's the rule now. The fact that you can undo your history in _your_ tree doesn't change anything at all. So I can merge with your new tree, but that won't actually help any: I'll just get a superset, the way you did things. The way to remove patches is to explicitly revert them (effectively applying a reverse diff), but I'm wondering if it's worth it in this case. I looked at the patches I did get, and they didn't look horribly bad per se. Are they dangerous? 2.6.12 is some time away, if for no other reason than the fact that this SCM thing has obviously eaten two weeks of my time. So I'd be inclined to chalk this up as a learning experience with git, and just go forward. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI trees, merges and git status
On Mon, 18 Apr 2005, James Bottomley wrote: Fair enough. If you pull from rsync://www.parisc-linux.org/~jejb/scsi-misc-2.6.git Thanks. Pulled and pushed out. Doing this exposed two bugs in your merge script: 1) It doesn't like a completely new directory (the misc tree contains a new drivers/scsi/lpfc) 2) the merge testing logic is wrong. You only want to exit 1 if the merge fails. Applied. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [darcs-devel] Darcs and git: plan of action
On Tue, 19 Apr 2005, Tupshin Harper wrote: I suspect that any use of wildcards in a new format would be impossible for darcs since it wouldn't allow darcs to construct dependencies, though I'll leave it to david to respond to that. Note that git _does_ very efficiently (and I mean _very_) expose the changed files. So if this kind of darcs patch is always the same pattern just repeated over n files, then you really don't need to even list the files at all. Git gives you a very efficient file listing by just doing a diff-tree (which does not diff the _contents_ - it really just gives you a pretty much zero-cost which files changed listing). So that combination would be 100% reliable _if_ you always split up darcs patches to common elements. And note that there does not have to be a 1:1 relationship between a git commit and a darcs patch. For example, say that you have a darcs patch that does a combination of change token x to token y in 100 files and rename file a into b. I don't know if you do those kind of combination patches at all, but if you do, why not just split them up into two? That way the list of files changed _does_ 100% determine the list of files for the token exchange. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: naive question
On Tue, 19 Apr 2005, Petr Baudis wrote: I'd actually prefer, if: (i) checkout-cache simply wouldn't touch files whose stat matches with what is in the cache; it updates the cache with the stat informations of touched files Run update-cache --refresh _before_ doing the checkout-cache, and that is exactly what will happen. But yes, if you want to make checkout-cache update the stat info (Ingo wanted to do that too), it should be possible. The end result is a combination of update-cache and checkout-cache, though: you'll effectively need to both (just in one pass). With the current setup, you have to do update-cache --refresh checkout-cache -f -a update-cache --refresh which is admittedly fairly inefficient. The real expense right now of a merge is that we always forget all the stat information when we do a merge (since it does a read-tree). I have a cunning way to fix that, though, which is to make read-tree -m read in the old index state like it used to, and then at the end just throw it away except for the stat information. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: naive question
On Tue, 19 Apr 2005, Linus Torvalds wrote: The real expense right now of a merge is that we always forget all the stat information when we do a merge (since it does a read-tree). I have a cunning way to fix that, though, which is to make read-tree -m read in the old index state like it used to, and then at the end just throw it away except for the stat information. Ok, done. That was really the plan all along, it just got dropped in the excitement of trying to get the dang thing to _work_ in the first place ;) The current version only does read-tree -m orig branch1 branch2 which now reads the old stat cache information, and then applies that to the end result of any trivial merges in case the merge result matches the old file stats. It really boils down to this littel gem; /* * See if we can re-use the old CE directly? * That way we get the uptodate stat info. */ if (path_matches(result, old) same(result, old)) *result = *old; and it seems to work fine. HOWEVER, I'll also make it do the same for a single-tree merge: read-tree -m newtree so that you can basically say read a new tree, and merge the stat information from the current cache. That means that if you do a read-tree -m newtree followed by a checkout-cache -f -a, the checkout-cache only checks out the stuff that really changed. You'll still need to do an update-cache --refresh for the actual new stuff. We could make checkout-cache update the cache too, but I really do prefer a checkout-cache only reads the index, never changes it world-view. It's nice to be able to have a read-only git tree. Final note: just doing a plain read-tree newtree will still throw all the stat info away, and you'll have to refresh it all... Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Tue, 19 Apr 2005, Chris Mason wrote: Very true, you can't replace quilt with git without ruining both of them. But it would be nice to take a quilt tree and turn it into a git tree for merging purposes, or to make use of whatever visualization tools might exist someday. Fair enough. The thing is, going from quilt-git really is a pretty big decision, since it's the decision that says I will now really commit all this quilt changes forever and ever. Which is also why I think it's actually ok to take a minute to do 100 quilt patches. This is not something you do on a whim. It's something you'd better think about. It's turning a very fluid environment into a unchangable, final thing. That said, I agree that write-tree is expensive. It tends to be by far the most expensive op you normally do. I'll make sure it goes faster. We already have a trust me, it hasn't changed via update-cache. Heh. I see update-cache not as a it hasn't changed, but a it _has_ changed, and now I want you to reflect that fact. In other words, update-cache is an active statement: it says that you're ready to commit your changes. In contrast, to me your write-tree thing in many ways is the reverse of that: it's saying don't look here, there's nothing interesting there. Which to me smells like trying to hide problems rather than being positive about them. Which it is, of course. It's trying to hide the fact that writing a tree is not instantaenous. With that said, I hate the patch too. I didn't see how to compare against the old tree without reading each tree object from the old tree, and that should be slower then what write-tree does now. Reading a tree is faster, simply because you uncompress instead of compress. So I can read a tree in 0.28 seconds, but it takes me 0.34 seconds to write one. That said, reading the trees has disk seek issues if it's not in the cache. What I'd actually prefer to do is to just handle tree caching the same way we handle file caching - in the index. Ie we could have the index file track what subtree is this directory associated with, and have a update-cache --refresh-dir thing that updates it (and any entry update in that directory obviously removes the dir-cache entry). Normally we'd not bother and it would never trigger, but it would be useful for your scripted setup it would end up caching all the tree information in a very efficient manner. Totally transparently, apart from the one --refresh-dir at the beginning. That one would be slightly expensive (ie would do all the stuff that write-tree does, but it would be done just once). (We could also just make write-tree do it _totally_ transparently, but then we're back to having write-tree both read _and_ write the index file, which is a situation that I've been trying to avoid. It's so much easier to verify the correctness of an operation if it is purely one-way). I'll think about it. I'd love to speed up write-tree, and keeping track of it in the index is a nice little trick, but it's not quite high enough up on my worries for me to act on it right now. But if you want to try to see how nasty it would be to add tree index entries to the index file at write-tree time automatically, hey... Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
On Tue, 19 Apr 2005, Greg KH wrote: Nice, it looks like the merge of this tree, and my usb tree worked just fine. Yup, it all seems to work out. So, what does this now mean? Is your kernel.org git tree now going to be the real kernel tree that you will be working off of now? Should we crank up the nightly snapshots and emails to the -commits list? I'm not quite ready to consider it real, but I'm getting there. I'm still working out some performance issues with merges (the actual merge operation itself is very fast, but I've been trying to make the subsequent update the working directory tree to the right thing be much better). Can I rely on the fact that these patches are now in your tree and I can forget about them? :) Just wondering how comfortable you feel with your git tree so far. Hold off for one more day. I'm very comfortable with how well git has worked out so far, and yes, mentally I consider this the tree, but the fact is, git isn't exactly easy on normal users. I think my merge stuff and Pasky's scripts are getting there, but I want to make sure that we have a version of Pasky's scripts that use the new read-tree -m optimizations to make tracking a tree faster, and I'd like to have them _tested_ a bit first. In other words, I want it to be at the point where people can do git pull repo-address and it will just work, at least for people who don't have any local changes in their tree. None of this check out all the files again crap. But how about a plan that we go live tomorrow - assuming nobody finds any problems before that, of course. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
On Tue, 19 Apr 2005, Steven Cole wrote: But perhaps a progress bar right about here might be a good thing for the terminally impatient. real3m54.909s user0m14.835s sys 0m10.587s 4 minutes might be long enough to cause some folks to lose hope. Well, the real operations took only 15 seconds. What kind of horribe person are you, that you don't have all of the kernel in your disk cache already? Shame on you. Or was the 4 minutes for downloading all the objest too? Anyway, it looks like you are using pasky's scripts, and the old patch-based upgrade at that. You certainly will _not_ see the [many files patched] patching file mm/mmap.c .. if you use a real git merge. That's probable be the real problem here. Real merges have no patches taking place _anywhere_. And they take about half a second. Doing an update of your tree should _literally_ boil down to # # repo needs to point to the repo we update from # rsync -avz --ignore-existing $repo/objects/. .git/objects/. rsync -L $repo/HEAD .git/NEW_HEAD || exit 1 read-tree -m $(cat .git/NEW_HEAD) || exit 1 checkout-cache -f -a update-cache --refresh mv .git/NEW_HEAD .git/HEAD and if it does anything else, it's literally broken. Btw, the above does need my read-tree -m thing which I committed today. (CAREFUL: the above is not a good script, because it _will_ just overwrite all your old contents with the stuff you updated to. You should thus not actually use something like this, but a git update should literally end up doing the above operations in the end, and just add proper checking). And if that takes 4 minutes, you've got problems. Just say no to patches. Linus PS: If you want a clean tree without any old files or anything else, for that matter, you can then do a show-files -z --others | xargs -0 rm, but be careful: that will blow away _anything_ that wasn't revision controlled with git. So don't blame me if your pr0n collection is gone afterwards. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Tue, 19 Apr 2005, Chris Mason wrote: 5) right before exiting, write-tree updates the index if it made any changes. This part won't work. It needs to do the proper locking, which means that it needs to create index.lock _before_ it reads the index file, and write everything to that one and then do a rename. If it doesn't need to do the write, it can just remove index.lock without writing to it, obviously. The downside to this setup is that I've got to change other index users to deal with directory entries that are there sometimes and missing other times. The nice part is that I don't have to invalidate the directory entry, if it is present, it is valid. To me, the biggest downside is actually the complexity part, and worrying about the directory index ever getting stale. How big do the changes end up being? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
On Tue, 19 Apr 2005, Junio C Hamano wrote: Let's for a moment forget what git-pasky currently does, which is not to touch .git/index until the user says Ok, let's commit. I think git-pasky is wrong. It's true that we want to often (almost always) diff against the last released thing, and I actually think git-pasky does what it does because I never wrote a tool to diff the current working directory against a tree. At the same time, I very much worked with a model where you do _not_ have a traditional work file, but the index really _is_ the work file. I'd like to start from a different premise and see what happens: - What .git/index records is *not* the state as the last commit. It is just an cache Cogito uses to speed up access to the user's working tree. From the user's point of view, it does not even exist. Yes. Yes. YES. That is indeed the whole point of the index file. In my world-view, the index file does _everything_. It's the staging area (work file), it's the merging area (merge directory) and it's the cache file (stat cache). I'll immediately write a tool to diff the current working directory against a tree object, and hopefully that will just make pasky happy with this model too. Is there any other reason why git-pasky wants to have a work file? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
On Tue, 19 Apr 2005, Linus Torvalds wrote: That is indeed the whole point of the index file. In my world-view, the index file does _everything_. It's the staging area (work file), it's the merging area (merge directory) and it's the cache file (stat cache). I'll immediately write a tool to diff the current working directory against a tree object, and hopefully that will just make pasky happy with this model too. Ok, immediately took a bit longer than I wanted to, and quite frankly, the end result is not very well tested. It was a bit more complex than I was hoping for to match up the index file against a tree object, since unlike the tree-tree comparison in diff-tree, you have to compare two cases where the layout isn't the same. No matter. It seems to work to a first approximation, and the result is such a cool tool that it's worth committing and pushing out immediately. The code ain't exactly pretty, but hey, maybe that's just me having higher standards of beauty than most. Or maybe you just shudder at what I consider pretty in the first place, in which case you probably shouldn't look too closely at this one. What the new diff-cache does is basically emulate diff-tree, except one of the trees is always the index file. You can also choose whether you want to trust the index file entirely (using the --cached flag) or ask the diff logic to show any files that don't match the stat state as being tentatively changed. Both of these operations are very useful indeed. For example, let's say that you have worked on your index file, and are ready to commit. You want to see eactly _what_ you are going to commit is without having to write a new tree object and compare it that way, and to do that, you just do diff-cache --cached $(cat .git/HEAD) (another difference between diff-tree and diff-cache is that the new diff-cache can take a commit object, and it automatically just extracts the tree information from there). Example: let's say I had renamed commit.c to git-commit.c, and I had done an upate-cache to make that effective in the index file. show-diff wouldn't show anything at all, since the index file matches my working directory. But doing a diff-cache does: [EMAIL PROTECTED]:~/git diff-cache --cached $(cat .git/HEAD) -100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74commit.c +100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74 git-commit.c So what the above diff-cache command line does is to say show me the differences between HEAD and the current index contents (the ones I'd write with a write-tree) And as you can see, the output matches diff-tree -r output (we always do -r, since the index is always fully populated). All the same rules: + means added file, - means removed file, and * means changed file. You can trivially see that the above is a rename. In fact, diff-tree --cached _should_ always be entirely equivalent to actually doing a write-tree and comparing that. Except this one is much nicer for the case where you just want to check. Maybe you don't want to do the tree. So doing a diff-cache --cached is basically very useful when you are asking yourself what have I already marked for being committed, and what's the difference to a previous tree. However, the non-cached version takes a different approach, and is potentially the even more useful of the two in that what it does can't be emulated with a write-tree + diff-tree. Thus that's the default mode. The non-cached version asks the question show me the differences between HEAD and the currently checked out tree - index contents _and_ files that aren't up-to-date which is obviously a very useful question too, since that tells you what you _could_ commit. Again, the output matches the diff-tree -r output to a tee, but with a twist. The twist is that if some file doesn't match the cache, we don't have a backing store thing for it, and we use the magic all-zero sha1 to show that. So let's say that you have edited kernel/sched.c, but have not actually done an update-cache on it yet - there is no object associated with the new state, and you get: [EMAIL PROTECTED]:~/v2.6/linux diff-cache $(cat .git/HEAD ) *100644-100664 blob 7476bbcfe5ef5a1dd87d745f298b831143e4d77e- kernel/sched.c ie it shows that the tree has changed, and that kernel/sched.c has is not up-to-date and may contain new stuff. The all-zero sha1 means that to get the real diff, you need to look at the object in the working directory directly rather than do an object-to-object diff. NOTE! As with other commands of this type, diff-cache does not actually look at the contents of the file at all. So maybe kernel/sched.c hasn't actually changed, and it's just that you touched it. In either case, it's a note that you need to upate-cache it to make the cache be in sync. NOTE 2! You can have a mixture
Re: [PATCH 2/3] init-db.c: normalize env var handling.
On Tue, 19 Apr 2005, Zach Welch wrote: This patch applies on top of: [PATCH 1/3] init-db.c: cleanup comments init-db.c | 11 +++ 1 files changed, 3 insertions(+), 8 deletions(-) Signed-Off-By: Zach Welch [EMAIL PROTECTED] Normalize init-db environment variable handling, allowing the creation of object directories with something other than DEFAULT_DB_ENVIRONMENT. --- a/init-db.c +++ b/init-db.c For future reference, this is in the wrong order. You should have checkin comment first, then signed-off-by, then a line with three dashes, and then administrative trivia. Ie I'd much rather see the email look like Normalize init-db environment variable handling, allowing the creation of object directories with something other than DEFAULT_DB_ENVIRONMENT. Signed-Off-By: Zach Welch [EMAIL PROTECTED] --- This patch applies on top of: [PATCH 1/3] init-db.c: cleanup comments init-db.c | 11 +++ 1 files changed, 3 insertions(+), 8 deletions(-) .. actual patch goes here .. since otherwise I'll just have to edit it that way. I like seeing the administrative stuff (diffstat etc), but I don't want to have it in the commit message, and that's exactly what the --- marker is for - my tools will automatically cut it off as if it was a signature (or the beginning of the patch). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] init-db.c: normalize env var handling.
On Tue, 19 Apr 2005, Zach Welch wrote: I feel even more abashed for my earlier scripting faux pas. Would you like me to resend them to you off-list? No, I edited them and applied them (the first series, I'll have to think about the second one). It's only when there are tens of patches that it gets really old really quickly to edit things by hand. Three I can handle ;) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Tue, 19 Apr 2005, Chris Mason wrote: I'll finish off the patch once you ok the basics below. My current code works like this: Chris, before you do anything further, let me re-consider. Assuming that the real cost of write-tree is the compression (and I think it is), I really suspect that this ends up being the death-knell to my use the sha1 of the _compressed_ object approach. I thought it was clever, and I was ready to ignore the other arguments against it, but if it turns out that we can speed up write-tree a lot by just doing the SHA1 on the uncompressed data, and noticing that we already have the tree before we need to compress it and write it out, then that may be a good enough reason for me to just admit that I was wrong about that decision. So I'll see if I can turn the current fsck into a convert into uncompressed format, and do a nice clean format conversion. Most of git is very format-agnostic, so that shouldn't be that painful. Knock wood. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
On Wed, 20 Apr 2005, Jon Seymour wrote: Am I correct to understand that with this change, all the objects in the database are still being compressed (so no net performance benefit), but by doing the SHA1 calculations before compression you are keeping open the possibility that at some point in the future you may use a different compression technique (including none at all) for some or all of the objects? Correct. There is zero performance benefit to this right now, and the only reason for doing it is because it will allow other things to happen. Note that the other things include: - change the compression format to make it cheaper - _keep_ the same compression format, but notice that we already have an object by looking at the uncompressed one. I'm actually leaning towards just #2 at this time. I like how things compress, and it sure is simple. The fact that we use the equivalent of -9 may be expensive, but the thing is, we don't actually write new files that often, and it's just CPU time (no seeking on disk or anything like that), which tends to get cheaper over time. So I suspect that once I optimize the tree writing to notice that oh, I already have this tree object, and thus build it up but never compressing it, write-tree performance will go up _hugely_ even without removing the compressioin. Because most of the time, write-tree actually only needs to create a couple of small new tree objects. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Accept commit in some places when tree is needed.
On Tue, 19 Apr 2005, Junio C Hamano wrote: This patch lifts the tree-from-tree-or-commit logic from diff-cache.c and moves it to sha1_file.c, which is a common library source for the SHA1 storage part. I don't think that's a good interface. It changes the sha1 passed into it: that may actually be nice, since you may want to know what it changed to, but I think you'd want to have that as an (optional) separate sha1_result parameter. Also, the type or size things make no sense to have as a parameter at all. IOW, it was fine when it was an internal hacky thing in diff-cache, but once it's promoted to be a real library function it should definitely be cleaned up to have sane interfaces that make sense in general, and not just within the original context. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Wed, 20 Apr 2005, C. Scott Ananian wrote: Hmm. Are our index files too large, or is there some other factor? They _are_ pretty large, but they have to be, For the kernel, the index file is about 1.6MB. That's - 17,000+ files and filenames - stat information for all of them - the sha1 for them all ie for the kernel it averages to 93.5 bytes per file. Which is actually pretty dense (just the sha1 and stat information is about half of it, and those are required). I was considering using a chunked representation for *all* files (not just blobs), which would avoid the original 'trees must reference other trees or they become too large' issue -- and maybe the performance issue you're referring to, as well? No. The most common index file operation is reading, and that's the one that has to be _fast_. And it is - it's a single mmap and some parsing. In fact, writing it is pretty fast too, exactly because the index file is totally linear and isn't compressed or anything fancy like that. It's a _lot_ faster than the tree objects, exactly because it doesn't need to be as careful. The main cost of the index file is probably the fact that I add a sha1 signature of the file into itself to verify that it's ok. The advantage is that the signature means that the file is ok, and the parsing of it can be much more relaxed. You win some, you lose some. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Wed, 20 Apr 2005, C. Scott Ananian wrote: OK, sure. But how 'bout chunking trees? Are you grown happy with the new trees-reference-other-trees paradigm, or is there a deep longing in your heart for the simplicity of 'trees-reference-blobs-period'? I'm pretty sure we do better chunking on a subdirectory basis, especially as it allows us to do various optimizations (avoid diffing common parts). Yes, you could try to do the same optimizations with chunking, but then you'd need to make sure that the chunking was always on a full tree entry boundary etc - ie much harder than blob chunking. But hey, numbers talk, bullshit walks. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Wed, 20 Apr 2005, Linus Torvalds wrote: To actually go faster, it _should_ need this patch. Untested. See if it works.. NO! Don't see if this works. For the sha1 file already exists file, it forgot to return the SHA1 value in returnsha1, and would thus corrupt the trees it wrote. So don't apply, don't test. You won't corrupt your archive (you'll just write bogus tree objects), but if you commit the bogus trees you're going to be in a world of hurt and will have to undo everything you did. It's a good test for fsck though. It core-dumps because it tries to add references to NULL objects. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] write-tree performance problems
On Wed, 20 Apr 2005, Linus Torvalds wrote: NO! Don't see if this works. For the sha1 file already exists file, it forgot to return the SHA1 value in returnsha1, and would thus corrupt the trees it wrote. Proper version with fixes checked in. For me, it brings down the time to write a kernel tree from 0.34s to 0.24s, so a third of the time was just compressing objects that we ended up already having. Two thirds to go ;) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html