[RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches

2014-01-22 Thread Linus Torvalds

From: Linus Torvalds torva...@linux-foundation.org
Date: Wed, 22 Jan 2014 12:32:30 -0800
Subject: [PATCH] Make 'git request-pull' more strict about matching
 local/remote branches

The current 'request-pull' will try to find matching commit on the given 
remote, and rewrite the please pull line to match that remote ref.

That may be very helpful if your local tree doesn't match the layout of 
the remote branches, but for the common case it's been a recurring 
disaster, when request-pull is done against a delayed remote update, and 
it rewrites the target branch randomly to some other branch name that 
happens to have the same expected SHA1 (or more commonly, leaves it 
blank).

To avoid that recurring problem, this changes git request-pull so that 
it matches the ref name to be pulled against the *local* repository, and 
then warns if the remote repository does not have that exact same branch 
or tag name and content.

This means that git request-pull will never rewrite the ref-name you gave 
it.  If the local branch name is xyzzy, that is the only branch name 
that request-pull will ask the other side to fetch.

If the remote has that branch under a different name, that's your problem 
and git request-pull will not try to fix it up (but git request-pull will 
warn about the fact that no exact matching branch is found, and you can 
edit the end result to then have the remote name you want if it doesn't 
match your local one).

The new find local ref code will also complain loudly if you give an
ambiguous refname (eg you have both a tag and a branch with that same
name, and you don't specify heads/name or tags/name).

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---

This should fix the problem we've had multiple times with kernel 
maintainers, where git request-pull ends up leaving the target branch 
name blank, because people either forgot to push it, or (more commonly) 
people pushed it just before doing the pull request, and it hadn't 
actually had time to mirror out to the public site.

Now, git request-pull will *warn* about the fact that the matching ref 
isn't found on the remote (and the new matching code is stricter at that), 
but it will never try to re-write the branch name that it asks the other 
end to pull.

So if the remote branch doesn't exist, you'll get a warning, but the pull 
request will still have the branch you specified.

The whole checking thing is both simplified (removing more lines than it 
adds) and made more strict.

Comments? It passes the tests I put it through locally, but I did *not* 
make it pass the test-suite, since it very much does change the rules. 
Some of the test suite code literally tests for the old completely broken 
case (at least t5150, subtests 4 and 5).

Thus the RFC part. Because the currect git request-pull behavior has been 
horrible.

 git-request-pull.sh | 110 
 1 file changed, 43 insertions(+), 67 deletions(-)

diff --git a/git-request-pull.sh b/git-request-pull.sh
index fe21d5db631c..659a412155d8 100755
--- a/git-request-pull.sh
+++ b/git-request-pull.sh
@@ -35,20 +35,7 @@ do
shift
 done
 
-base=$1 url=$2 head=${3-HEAD} status=0 branch_name=
-
-headref=$(git symbolic-ref -q $head)
-if git show-ref -q --verify $headref
-then
-   branch_name=${headref#refs/heads/}
-   if test z$branch_name = z$headref ||
-   ! git config branch.$branch_name.description /dev/null
-   then
-   branch_name=
-   fi
-fi
-
-tag_name=$(git describe --exact $head^0 2/dev/null)
+base=$1 url=$2 status=0
 
 test -n $base  test -n $url || usage
 
@@ -58,55 +45,68 @@ then
 die fatal: Not a valid revision: $base
 fi
 
+#
+# $3 must be a symbolic ref, a unique ref, or
+# a SHA object expression
+#
+head=$(git symbolic-ref -q ${3-HEAD})
+head=${head:-$(git show-ref ${3-HEAD} | cut -d' ' -f2)}
+head=${head:-$(git rev-parse --quiet --verify $3)}
+
+# None of the above? Bad.
+test -z $head  die fatal: Not a valid revision: $3
+
+# This also verifies that the resulting head is unique:
+# git show-ref could have shown multiple matching refs..
 headrev=$(git rev-parse --verify --quiet $head^0)
-if test -z $headrev
+test -z $headrev  die fatal: Ambiguous revision: $3
+
+# Was it a branch with a description?
+branch_name=${head#refs/heads/}
+if test z$branch_name = z$headref ||
+   ! git config branch.$branch_name.description /dev/null
 then
-die fatal: Not a valid revision: $head
+   branch_name=
 fi
 
+prettyhead=${head#refs/}
+prettyhead=${prettyhead#heads/}
+
 merge_base=$(git merge-base $baserev $headrev) ||
 die fatal: No commits in common between $base and $head
 
-# $head is the token given from the command line, and $tag_name, if
-# exists, is the tag we are going to show the commit information for.
-# If that tag exists at the remote and it points at the commit, use it.
-# Otherwise, if a branch with the same name as $head exists at the remote

Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches

2014-01-22 Thread Linus Torvalds
On Wed, Jan 22, 2014 at 1:46 PM, Junio C Hamano gits...@pobox.com wrote:

 The new find local ref code will also complain loudly if you give an
 ambiguous refname (eg you have both a tag and a branch with that same
 name, and you don't specify heads/name or tags/name).

 But this part might be a bit problematic.  $3=master will almost
 always have refs/heads/master and refs/remotes/origin/master listed
 because the call to show-ref comes before rev-parse --verify,
 no?

Hmm. Yes.

It's done that way very much on purpose, to avoid the branch/tag
ambiguity (which we have had problems with), but you're right, it also
ends up being ambiguous wrt remote branches, which wasn't the
intention, and you're right, that is not acceptable.

Damn. I very much want to get the full ref-name (ie master should
become refs/heads/master), and I do want to avoid the branch/tag
ambiguity, but you're right, show-ref plus the subsequent rev-parse
--verify comes close but not quite close enough.

Any ideas? The hacky way is to do | head -1 to take the first
show-ref output, and then check if you get a different result if you
re-do it using show-ref --tags. But that sounds really excessively
hacky. Is there a good way to do it?

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches

2014-01-22 Thread Linus Torvalds
On Wed, Jan 22, 2014 at 2:03 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Any ideas? The hacky way is to do | head -1 to take the first
 show-ref output, and then check if you get a different result if you
 re-do it using show-ref --tags. But that sounds really excessively
 hacky. Is there a good way to do it?

Using git show-refs --tags --heads would work for the common case
(since that ignores remote branches), but would then disallow remote
branches entirely.

That might be ok in practice, but it's definitely wrong too.

I'm probably missing some obvious solution.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Make 'git request-pull' more strict about matching local/remote branches

2014-01-22 Thread Linus Torvalds
On Wed, Jan 22, 2014 at 2:14 PM, Junio C Hamano gits...@pobox.com wrote:

 I looked at 5150.4 and found that what it attempts to do is halfway
 sensible.

I agree that it is half-way sensible. The important bit being the HALF part.

The half part is why we have the semantics we have. There's no
question about that.

The problem is, the *other* half is pure and utter crap. The half-way
sensible solution then generates pure and utter garbage in the
totally sensible case.

And that's why I think it needs to be fixed. Not because the existing
behavior can never make sense in some circumstances, but because the
existing behavior can screw up really really badly in other (arguably
more common, and definitely real) circumstances.

For the kernel, the broken missing branch name situation has come up
pretty regularly. This is definitely not a one-time event, it's more
like almost every merge window somebody gets screwed by this and I
have to guess what the branch name should have been.

I think that we could potentially do a local:remote syntax for that
half-way sensible case, so that if you do

   git push .. master:for-linus

then you have to do

   git request-pull .. master:for-linus

to match the fact that you renamed your local branch on the remote.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote

2014-01-22 Thread Linus Torvalds

From: Linus Torvalds torva...@linux-foundation.org
Date: Wed, 22 Jan 2014 15:23:48 -0800
Subject: [PATCH] Make request-pull able to take a refspec of form local:remote

This allows a user to say that a local branch has a different name on
the remote server, using the same syntax that git push uses to create
that situation.

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---

So this relaxes the remote matching, and allows using the local:remote 
syntax to say that the local branch is differently named from the remote 
one.

It is probably worth folding it into the previous patch if you think this 
whole approach is workable.

 git-request-pull.sh | 50 +-
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/git-request-pull.sh b/git-request-pull.sh
index 659a412155d8..c8ab0e912011 100755
--- a/git-request-pull.sh
+++ b/git-request-pull.sh
@@ -47,19 +47,23 @@ fi
 
 #
 # $3 must be a symbolic ref, a unique ref, or
-# a SHA object expression
+# a SHA object expression. It can also be of
+# the format 'local-name:remote-name'.
 #
-head=$(git symbolic-ref -q ${3-HEAD})
-head=${head:-$(git show-ref ${3-HEAD} | cut -d' ' -f2)}
-head=${head:-$(git rev-parse --quiet --verify $3)}
+local=${3%:*}
+local=${local:-HEAD}
+remote=${3#*:}
+head=$(git symbolic-ref -q $local)
+head=${head:-$(git show-ref --heads --tags $local | cut -d' ' -f2)}
+head=${head:-$(git rev-parse --quiet --verify $local)}
 
 # None of the above? Bad.
-test -z $head  die fatal: Not a valid revision: $3
+test -z $head  die fatal: Not a valid revision: $local
 
 # This also verifies that the resulting head is unique:
 # git show-ref could have shown multiple matching refs..
 headrev=$(git rev-parse --verify --quiet $head^0)
-test -z $headrev  die fatal: Ambiguous revision: $3
+test -z $headrev  die fatal: Ambiguous revision: $local
 
 # Was it a branch with a description?
 branch_name=${head#refs/heads/}
@@ -69,9 +73,6 @@ then
branch_name=
 fi
 
-prettyhead=${head#refs/}
-prettyhead=${prettyhead#heads/}
-
 merge_base=$(git merge-base $baserev $headrev) ||
 die fatal: No commits in common between $base and $head
 
@@ -81,30 +82,37 @@ die fatal: No commits in common between $base and $head
 #
 # Otherwise find a random ref that matches $headrev.
 find_matching_ref='
-   my ($exact,$found);
+   my ($head,$headrev) = (@ARGV);
+   my ($found);
+
while (STDIN) {
+   chomp;
my ($sha1, $ref, $deref) = /^(\S+)\s+([^^]+)(\S*)$/;
-   next unless ($sha1 eq $ARGV[1]);
-   if ($ref eq $ARGV[0]) {
-   $exact = $ref;
+   my ($pattern);
+   next unless ($sha1 eq $headrev);
+
+   $pattern=/$head\$;
+   if ($ref eq $head) {
+   $found = $ref;
+   }
+   if ($ref =~ /$pattern/) {
+   $found = $ref;
}
-   if ($sha1 eq $ARGV[0]) {
+   if ($sha1 eq $head) {
$found = $sha1;
}
}
-   if ($exact) {
-   print $exact\n;
-   } elsif ($found) {
+   if ($found) {
print $found\n;
}
 '
 
-ref=$(git ls-remote $url | @@PERL@@ -e $find_matching_ref $head 
$headrev)
+ref=$(git ls-remote $url | @@PERL@@ -e $find_matching_ref 
${remote:-HEAD} $headrev)
 
 if test -z $ref
 then
-   echo warn: No match for $prettyhead found at $url 2
-   echo warn: Are you sure you pushed '$prettyhead' there? 2
+   echo warn: No match for commit $headrev found at $url 2
+   echo warn: Are you sure you pushed '${remote:-HEAD}' there? 2
status=1
 fi
 
@@ -116,7 +124,7 @@ git show -s --format='The following changes since commit %H:
 
 are available in the git repository at:
 ' $merge_base 
-echo   $url $prettyhead 
+echo   $url $remote 
 git show -s --format='
 for you to fetch changes up to %H:
 
-- 
1.9.rc0.10.gf0799f9.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote

2014-01-23 Thread Linus Torvalds
On Thu, Jan 23, 2014 at 11:43 AM, Junio C Hamano gits...@pobox.com wrote:

 I am not sure if it is a good idea to hand-craft resulting head is
 unique constraint here.  We already have disambiguation rules (and
 warning mechanism) we use in other places---this part should use the
 same rule, I think.

If you can fix that, then yes, that would be lovely. As it is, I
couldn't find any easily scriptable way to do that.

  #
  # Otherwise find a random ref that matches $headrev.
  find_matching_ref='
 + my ($head,$headrev) = (@ARGV);
 + my ($found);
 +
   while (STDIN) {
 + chomp;
   my ($sha1, $ref, $deref) = /^(\S+)\s+([^^]+)(\S*)$/;
 + my ($pattern);
 + next unless ($sha1 eq $headrev);
 +
 + $pattern=/$head\$;

 I think $head is constant inside the loop, so lift it outside?

Yes. I'm not really a perl person, and this came from me trying to
make the code more readable (and it used to do that magic quoting
thing inside the loop, I just used a helper pattern variable).

 + if ($sha1 eq $head) {

 I think this is $headrev ($head may be $remote or HEAD), but then
 anything that does not point at $headrev has already been rejected
 at the beginning of this loop, so...?

No, this is for when head ends up not being a ref, but a SHA1 expression.

IOW, for when you do something odd like

git request-pull HEAD^^ origin HEAD^

when hacking things together. It doesn't actually generate the right
request-pull message (because there's no valid branch name), but it
*works* in the sense that you can get the diffstat etc and edit things
manually.

It's not a big deal - it has never really worked, and I actually
broke that when I then used $remote that doesn't actually have the
SHA1 any more.

 + if ($found) {
   print $found\n;
   }
  '

 I somehow feel that this is inadequate to catch the delayed
 propagation error in the opposite direction.  The publish
 repository may have an unrelated ref pointing at the $headrev and we
 may guess that is the ref to be fetched by the integrator based on
 that, but by the time the integrator fetches from the repository,
 the ref may have been updated to its new value that does not match
 $headrev.  But I do not think of a way to solve that one.

Yes, so you'll get a warning (or, if you get a partial match, maybe
not even that), but the important part about all these changes is that
it DOESN'T MATTER.

Why? Because it no longer re-writes the target branch name based on
that match or non-match. So the pull request will be fine.

In other words, the really fundamental change here i that the oops, I
couldn't find things on the remote no longer affects the output. It
only affects the warning. And I think that's important.

It used to be that the remote matching actually changed the output of
the request-pull, and *THAT* was the fundamental problem.

 In any case, shouldn't we be catching duplicate matches here, if the
 real objective is to make it less likely for the users to make
 mistakes?

It would be good, yes. But my perl-fu is weak, and I really didn't
want to worry about it. Also, as above: my primary issue was to not
screw up the output, so the remote matching actually has become much
less important, and now the warning about it is purely about being
helpful, it no longer fundamentally alters any semantics.

So I agree that there is room for improvement, but that's kind of
separate from the immediate problem I was trying to solve.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote

2014-01-23 Thread Linus Torvalds
On Thu, Jan 23, 2014 at 2:58 PM, Junio C Hamano gits...@pobox.com wrote:

 Will be fine, provided if they always use local:remote syntax, I'd
 agree.

Why? No sane user should actually need to use the local:remote syntax.

The normal situation should be that you create the correctly named
branch or tag locally, and then push it out under that name.

So I don't actually think anybody should need to be retrained, or
always use the local:remote syntax. The local:remote syntax exists
only for that special insane case where you used (the same)
local:remote syntax to push out a branch under a different name.

[ And yeah, maybe that behavior is more common than I think, but even
if it is, such behavior would always be among people who are *very*
aware of the whole local branch vs remote branch name is different
situation. ]

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re* [RFC PATCH 2/1] Make request-pull able to take a refspec of form local:remote

2014-01-29 Thread Linus Torvalds
On Wed, Jan 29, 2014 at 3:34 PM, Junio C Hamano gits...@pobox.com wrote:

 I am not yet doing the docs, but here is a minimal (and I think is
 the most sensible) fix to the If I asked a tag to be pulled, I used
 to get the message from the tag in the output---the updated code no
 longer does so problem.

That was a complete oversight/bug on my part, due to just removing the
tag_name special cases, not thinking about the tag message.

Thinking some more about the tag_name issue, I realize that the other
patch (Make request-pull able to take a refspec of form
local:remote) broke another thing.

The first patch pretty-printed the local branch-name, removing refs/
and possibly heads/ from the local refname. So for a branch, it
would ask people to just pull from the branch-name, and for a tag it
would ask people to pull from tags/name, which is good policy. So if
you had a tag called for-linus, it would say so (using
tags/for-linus).

But the local:remote syntax thing ends up breaking that nice feature.
The old find_matching_refs would actually cause us to show the tags
part if it existed on the remote, but that had become pointless and
counter-productive with the first patch. But with the second patch,
maybe we should reinstate that logic..

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Odd git diff breakage

2014-03-31 Thread Linus Torvalds
I hit this oddity when not remembering the right syntax for --color-words..

Try this (outside of a git repository):

   touch a b
   git diff -u --color=words a b

and watch it scroll (infinitely) printing out

   error: option `color' expects always, auto, or never

forever.

I haven't tried to root-cause it, since I'm supposed to be merging stuff..

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd git diff breakage

2014-03-31 Thread Linus Torvalds
On Mon, Mar 31, 2014 at 11:30 AM, Junio C Hamano gits...@pobox.com wrote:

 Hmph, interesting.  outside a repository is the key, it seems.

Well, you can do it inside a repository too, but then you need to use
the --no-index flag to get the diff two files behavior. It will
result in the same infinite error messages.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] diff-no-index: correctly diagnose error return from diff_opt_parse()

2014-03-31 Thread Linus Torvalds
On Mon, Mar 31, 2014 at 11:47 AM, Junio C Hamano gits...@pobox.com wrote:

 Instead, make it act like so:

 $ git diff --no-index --color=words a b
 error: option `color' expects always, auto, or never
 fatal: invalid diff option/value: --color=words

Thanks,

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sources for 3.18-rc1 not uploaded

2014-10-20 Thread Linus Torvalds
Junio, Brian,

  it seems that the stability of the git tar output is broken.

On Mon, Oct 20, 2014 at 4:59 AM, Konstantin Ryabitsev
konstan...@linuxfoundation.org wrote:

 Looks like 3.18-rc1 upload didn't work:

 This is why the front page still lists 3.17 as the latest mainline. Want
 to try again?

Ok, tried again, and failed again.

 If that still doesn't work, you may have to use version 1.7 of git when
 generating the tarball and signature -- I recall Greg having a similar
 problem in the past.

Ugh, yes, that seems to be it. Current git generates different
tar-files than older releases do:

   tar-1.7.9.7 tar-cur differ: byte 107, line 1

and a quick bisection shows that it is due to commit 10f343ea814f
(archive: honor tar.umask even for pax headers) in the current git
development version.

Junio, quite frankly, I don't think that that fix was a good idea. I'd
suggest having a *separate* umask for the pax headers, so that we do
not  break this long-lasting stability of git archive output in ways
that are unfixable and not compatible. kernel.org has relied (for a
*long* time) on being able to just upload the signature of the
resulting tar-file, because both sides can generate the same tar-fiel
bit-for-bit.

So instead of using tar_umask, please make it use tar_pax_umask,
and have that default to 000. Ok?

Something like the attached patch.

Or just revert 10f343ea814f entirely.

   Linus
From d5ca7ae0a34e31c48397f59b03ecabda7c5c40b2 Mon Sep 17 00:00:00 2001
From: Linus Torvalds torva...@linux-foundation.org
Date: Mon, 20 Oct 2014 08:21:38 -0700
Subject: [PATCH] Don't use the default 'tar.umask' for pax headers

That wasn't the original behavior, and doing so breaks the fact that
tar-files are bit-for-bit compatible across git versions.

If you really want to work around broken receiving tar implementations
(dubious, we've not needed to do so before), use [tar] paxumask in the
git config file.  Or maybe we could expose some command line flag to do
so.  But don't break existing format compatibility for dubious gains.

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---
 archive-tar.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index df2f4c8a6437..40139ea4ee4e 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -14,6 +14,7 @@ static char block[BLOCKSIZE];
 static unsigned long offset;
 
 static int tar_umask = 002;
+static int tar_pax_umask = 000;
 
 static int write_tar_filter_archive(const struct archiver *ar,
 struct archiver_args *args);
@@ -192,7 +193,7 @@ static int write_extended_header(struct archiver_args *args,
 	unsigned int mode;
 	memset(header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_EXT_HEADER;
-	mode = 0100666  ~tar_umask;
+	mode = 0100666  ~tar_pax_umask;
 	sprintf(header.name, %s.paxheader, sha1_to_hex(sha1));
 	prepare_header(args, header, mode, size);
 	write_blocked(header, sizeof(header));
@@ -300,7 +301,7 @@ static int write_global_extended_header(struct archiver_args *args)
 	strbuf_append_ext_header(ext_header, comment, sha1_to_hex(sha1), 40);
 	memset(header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
-	mode = 0100666  ~tar_umask;
+	mode = 0100666  ~tar_pax_umask;
 	strcpy(header.name, pax_global_header);
 	prepare_header(args, header, mode, ext_header.len);
 	write_blocked(header, sizeof(header));
@@ -374,6 +375,15 @@ static int git_tar_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, tar.paxumask)) {
+		if (value  !strcmp(value, user)) {
+			tar_pax_umask = umask(0);
+		} else {
+			tar_pax_umask = git_config_int(var, value);
+		}
+		return 0;
+	}
+
 	return tar_filter_config(var, value, cb);
 }
 
-- 
2.1.2.330.g565301e



Re: Sources for 3.18-rc1 not uploaded

2014-10-20 Thread Linus Torvalds
On Mon, Oct 20, 2014 at 3:28 PM, brian m. carlson
sand...@crustytoothpaste.net wrote:

 It doesn't appear that the stability of git archive --format=tar is
 documented anywhere.  Given that, it doesn't seem reasonable to expect
 that any tar implementation produces bit-for-bit compatible output
 between versions.

The kernel has simple stability rules: if it breaks users, it gets
fixed or reverted. That is a damn good rule.

I realize that some other projects are crap, and don't care about
their users. I hope and believe that git is not in that sad group.

The whole it's not documented excuse is pure and utter bollocks.
Users don't care. And stability of data should be *expected*, not need
some random documentation entry to make it explicit.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sources for 3.18-rc1 not uploaded

2014-10-21 Thread Linus Torvalds
On Tue, Oct 21, 2014 at 1:08 AM, Michael J Gruber
g...@drmicha.warpmail.net wrote:

 Unfortunately, the git archive doc clearly says that the umask is
 applied to all archive entries. And that clearly wasn't the case (for
 extended metadata headers) before Brian's fix.

Hey, it's time for another round of the world-famous Captain Obvious
Quiz Game! Yay!

The questions these week are:

 (1) If reality and documentation do not match, where is the bug?
(a) Documentation is buggy
(b) Reality is buggy

 (2) Where would you put the horse in relationship to a horse-drawn carriage?
(a) in front
(b) in the carriage

Now, if you answered (a) to both these questions, and had this been a
real quiz show, you might have been a winner and the happy new owner
of a remote-controlled four-slice toaster with a fancy digital timer.

Sadly, this was just a dry-run for the real thing, to give people a
quick taste of the world-famous Captain Obvious Quiz Game. I hope
you tune in next week for our exciting all-new questions.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Linus Torvalds
On Mon, Aug 19, 2013 at 8:41 AM, Steffen Prohaska proha...@zib.de wrote:

 The reason was that read() immediately returns with EINVAL if nbyte =
 2GB.  According to POSIX [1], if the value of nbyte passed to read() is
 greater than SSIZE_MAX, the result is implementation-defined.

Yeah, the OS X filesystem layer is an incredible piece of shit. Not
only doesn't it follow POSIX, it fails *badly*. Because OS X kernel
engineers apparently have the mental capacity of a retarded rodent on
crack.

Linux also refuses to actually read more than a maximum value in one
go (because quite frankly, doing more than 2GB at a time is just not
reasonable, especially in unkillable disk wait), but at least Linux
gives you the partial read, so that the usual read until you're
happy works (which you have to do anyway with sockets, pipes, NFS
intr mounts, etc etc). Returning EINVAL is a sign of a diseased mind.

I hate your patch for other reasons, though:

 The problem for read() is addressed in a similar way by introducing
 a wrapper function in compat that always reads less than 2GB.

Why do you do that? We already _have_ wrapper functions for read(),
namely xread().  Exactly because you basically have to, in order to
handle signals on interruptible filesystems (which aren't POSIX
either, but at least sanely so) or from other random sources. And to
handle the you can't do reads that big issue.

So why isn't the patch much more straightforward? Like the attached
totally untested one that just limits the read/write size to 8MB
(which is totally arbitrary, but small enough to not have any latency
issues even on slow disks, and big enough that any reasonable IO
subsystem will still get good throughput).

And by totally untested I mean that it actually passes the git test
suite, but since I didn't apply your patch nor do I have OS X
anywhere, I can't actually test that it fixes *your* problem. But it
should.


   Linus


patch.diff
Description: Binary data


Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Linus Torvalds
On Mon, Aug 19, 2013 at 10:16 AM, Junio C Hamano gits...@pobox.com wrote:
 Linus Torvalds torva...@linux-foundation.org writes:

 The same argument applies to xwrite(), but currently we explicitly
 catch EINTR and EAGAIN knowing that on sane systems these are the
 signs that we got interrupted.

 Do we catch EINVAL unconditionally in the same codepath?

No, and we shouldn't. If EINVAL happens, it will keep happening.

But with the size limiter, it doesn't matter, since we won't hit the
OS X braindamage.

 Could
 EINVAL on saner systems mean completely different thing (like our
 caller is passing bogus parameters to underlying read/write, which
 is a program bug we would want to catch)?

Yes. Even on OS X, it means that - it's just that OS X notion of what
is bogus is pure crap. But the thing is, looping on EINVAL would be
wrong even on OS X, since unless you change the size, it will keep
happening forever.

But with the limit IO to 8MB (or whatever) patch, the issue is moot.
If you get an EINVAL, it will be due to something else being horribly
horribly wrong.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Linus Torvalds
On Mon, Aug 19, 2013 at 2:56 PM, Kyle J. McKay mack...@gmail.com wrote:

 The fact that the entire file is read into memory when applying the filter
 does not seem like a good thing (see #7-#10 above).

Yeah, that's horrible. Its likely bad for performance too, because
even if you have enough memory, it blows everything out of the L2/L3
caches, and if you don't have enough memory it obviously causes other
problems.

So it would probably be a great idea to make the filtering code able
to do things in smaller chunks, but I suspect that the patch to chunk
up xread/xwrite is the right thing to do anyway.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] Developer's Certificate of Origin: default to COPYING

2013-09-12 Thread Linus Torvalds
On Thu, Sep 12, 2013 at 3:30 PM, Junio C Hamano gits...@pobox.com wrote:
 Linus, this is not limited to us, so I am bothering you; sorry about
 that.

 My instinct tells me that some competent lawyers at linux-foundation
 helped you with the wording of DCO, and we amateurs shouldn't be
 mucking with the text like this patch does at all, but just in case
 you might find it interesting...

There were lawyers involved, yes.

I'm not sure there is any actual confusion, because the fact is,
lawyers aren't robots or programmers, and they have the human
qualities of understanding implications. So I'm actually inclined to
not change legal text unless a lawyer actually tells me that it's
needed.

Plus even if this change was needed, why would anybody point to
COPYING. It's much better to just say the copyright license of the
file, knowing that different projects have different rules about this
all, and some projects mix files from different sources, where parts
of the tree may be under different licenses that may be explained
elsewhere..

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] Developer's Certificate of Origin: default to COPYING

2013-09-12 Thread Linus Torvalds
On Thu, Sep 12, 2013 at 4:15 PM, Richard Hansen rhan...@bbn.com wrote:

 Is it worthwhile to poke a lawyer about this as a precaution?  (If so,
 who?)  Or do we wait for a motivating event?

I can poke the lawyer that was originally involved. If people know
other lawyers, feel free to poke them too. Just ask them to be
realistic, not go into some kind of super-anal lawyer mode where they
go off on some what if thing.

Note that one issue is that this is kind of like a license change,
even if it's arguably just a clarification. I'd expect that a lawyer
who is so anal that they think this wording needs change would also
think that the DCO version number needs change and then spend half an
hour (and $500) talking about how this only affects new sign-offs and
how you'd want to make it very obvious how things have changed, Yadda
yadda.

IOW, my personal opinion is that if you get a lawyer that is _that_
interested in irrelevant details, you have much bigger problems than
this particular wording. Lawyers do tend to be particular about
wording, but in the end, they tend to also agree that intent matters.
At least the good ones who have a case. Once they start talking about
the meaning of the word 'is', you know they are just weaselwording
and don't actually have any real argument.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-compat-util: Avoid strcasecmp() being inlined

2013-09-13 Thread Linus Torvalds
On Fri, Sep 13, 2013 at 12:53 PM, Sebastian Schuberth
sschube...@gmail.com wrote:

 +#ifdef __MINGW32__
 +#ifdef __NO_INLINE__

Why do you want to push this insane workaround for a clear Mingw bug?

Please have mingw just fix the nasty bug, and the git patch with the
trivial wrapper looks much simpler than just saying don't inline
anything and that crazy block of nasty mingw magic #defines/.

And then document loudly that the wrapper is due to the mingw bug.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] sound fixes for 3.6-rc6

2012-09-13 Thread Linus Torvalds
On Thu, Sep 13, 2012 at 7:43 PM, Takashi Iwai ti...@suse.de wrote:
 are available in the git repository at:

   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus

*PLEASE* don't do this.

You point to a branch, but then the pull request clearly implies there
is a tag with extra information in it.

And indeed, the actual thing I should pull is not at all for-linus,
it seems to be your tags/sound-3.6 tag.

I don't know if this is the old git pull-request breakage where it
stupidly corrects the remote branch when it verifies the branch
name, or whether it's some other scripting problem. I think current
git versions should not mess up the tag information, if that's the
cause, but please verify.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mailinfo: don't require text mime type for attachments

2012-09-30 Thread Linus Torvalds

Currently git am does insane things if the mbox it is given contains 
attachments with a MIME type that aren't text/*.

In particular, it will still decode them, and pass them one line at a 
time to the mail body filter, but because it has determined that they 
aren't text (without actually looking at the contents, just at the mime 
type) the line will be the encoding line (eg 'base64') rather than a 
line of *content*.

Which then will cause the text filtering to fail, because we won't 
correctly notice when the attachment text switches from the commit message 
to the actual patch. Resulting in a patch failure, even if patch may be a 
perfectly well-formed attachment, it's just that the message type may be 
(for example) application/octet-stream instead of text/plain.

Just remove all the bogus games with the message_type. The only difference 
that code creates is how the data is passed to the filter function 
(chunked per-pred-code line or per post-decode line), and that difference 
is *wrong*, since chunking things per pre-decode line can never be a 
sensible operation, and cannot possibly matter for binary data anyway.

This code goes all the way back to March of 2007, in commit 87ab79923463 
(builtin-mailinfo.c infrastrcture changes), and apparently Don used to 
pass random mbox contents to git. However, the pre-decode vs post-decode 
logic really shouldn't matter even for that case, and more importantly, I 
fed git am crap is not a valid reason to break *real* patch attachments.

If somebody really cares, and determines that some attachment is binary 
data (by looking at the data, not the MIME-type), the whole attachment 
should be dismissed, rather than fed in random-sized chunks to 
handle_filter().

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
Cc: Don Zickus dzic...@redhat.com
---
 builtin/mailinfo.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index 2b3f4d955eaa..da231400b327 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -19,9 +19,6 @@ static struct strbuf email = STRBUF_INIT;
 static enum  {
TE_DONTCARE, TE_QP, TE_BASE64
 } transfer_encoding;
-static enum  {
-   TYPE_TEXT, TYPE_OTHER
-} message_type;
 
 static struct strbuf charset = STRBUF_INIT;
 static int patch_lines;
@@ -184,8 +181,6 @@ static void handle_content_type(struct strbuf *line)
struct strbuf *boundary = xmalloc(sizeof(struct strbuf));
strbuf_init(boundary, line-len);
 
-   if (!strcasestr(line-buf, text/))
-message_type = TYPE_OTHER;
if (slurp_attr(line-buf, boundary=, boundary)) {
strbuf_insert(boundary, 0, --, 2);
if (++content_top  content[MAX_BOUNDARIES]) {
@@ -657,7 +652,6 @@ again:
/* set some defaults */
transfer_encoding = TE_DONTCARE;
strbuf_reset(charset);
-   message_type = TYPE_TEXT;
 
/* slurp in this section's info */
while (read_one_header_line(line, fin))
@@ -871,11 +865,6 @@ static void handle_body(void)
strbuf_insert(line, 0, prev.buf, prev.len);
strbuf_reset(prev);
 
-   /* binary data most likely doesn't have newlines */
-   if (message_type != TYPE_TEXT) {
-   handle_filter(line);
-   break;
-   }
/*
 * This is a decoded line that may contain
 * multiple new lines.  Pass only one chunk
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fix git diff --stat for interesting - but empty - file changes

2012-10-17 Thread Linus Torvalds
The behavior of git diff --stat is rather odd for files that have
zero lines of changes: it will discount them entirely unless they were
renames.

Which means that the stat output will simply not show files that only
had other changes: they were created or deleted, or their mode was
changed.

Now, those changes do show up in the summary, but so do renames, so
the diffstat logic is inconsistent. Why does it show renames with zero
lines changed, but not mode changes or added files with zero lines
changed?

So change the logic to not check for is_renamed, but for
is_interesting instead, where interesting is judged to be any
action but a pure data change (because a pure data change with zero
data changed really isn't worth showing, if we ever get one in our
diffpairs).

So if you did

   chmod +x Makefile
   git diff --stat

before, it would show empty ( 0 files changed), with this it shows

 Makefile | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

which I think is a more correct diffstat (and then with --summary it
shows *what* the metadata change to Makefile was - this is completely
consistent with our handling of renamed files).

Side note: the old behavior was *really* odd. With no changes at all,
git diff --stat output was empty. With just a chmod, it said 0
files changed. No way is our legacy behavior sane.

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---

This was triggered by kernel developers not noticing that they had
added zero-sized files, because those additions never showed up in the
diffstat.

NOTE! This does break two of our tests, so we clearly did this on
purpose, or at least tested for it. I just uncommented the subtests
that this makes irrelevant, and changed the output of another one.

Another test was simply buggy. It used git diff --root cmit, and
thought that would be the diff against root. It isn't, and never has
been. It just happened to give the same (no file) output before.
Fixing --stat to show new files showed how buggy the test was. The
--root thing matters for git show or git log (when showing a
root commit) and for git diff-tree with a single tree.

Maybe we would *want* to make git diff --root cmit be the diff
between root and cmit, but that's not what it actually is.

Comments?


patch.diff
Description: Binary data


Re: Fix git diff --stat for interesting - but empty - file changes

2012-10-17 Thread Linus Torvalds
On Wed, Oct 17, 2012 at 11:28 AM, Junio C Hamano gits...@pobox.com wrote:

 I think listing a file whose content remain unchanged with 0 as the
 number of lines affected makes sense, and it will mesh well with
 Duy's

   http://thread.gmane.org/gmane.comp.version-control.git/207749

 I first wondered if we would get a division-by-zero while scaling
 the graph, but we do not scale smaller numbers up to fill the
 columns, so we should be safe.

Note that we should be safe for a totally different - and more
fundamental - reason: the zero line case is by no means new. We've
always done it for the rename case.

 These days, we omit 0 insertions and 0 deletions, so I am not sure
 what you should get for this case, though:

  Makefile | 0
  1 file changed, 0 insertions(+), 0 deletions(-)

 Should we just say 1 file changed?

If that is what it does for the rename case, then yes. I think it
should fall out naturally.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE

2012-10-23 Thread Linus Torvalds
On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner t...@linutronix.de wrote:

 It is spelled:

   git notes add -m comment SHA1

 Cool!

Don't use them for anything global.

Use them for local codeflow, but don't expect them to be distributed.
It's a separate flow, and while it *can* be distributed, it's not
going to be for the kernel, for example. So no, don't start using this
to ack things, because the acks *will* get lost.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE

2012-10-23 Thread Linus Torvalds
On Wed, Oct 24, 2012 at 4:56 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 How about git commit --allow-empty, with
 belated ACK for commit

Don't bother. It's not that important, and it's just distracting.

It's not like this is vital information. If you pushed it out without
the ack, it's out without the ack. Big deal.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: first parent, commit graph layout, and pull merge direction

2013-05-23 Thread Linus Torvalds
On Thu, May 23, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote:

 If the proposal were to make pull.rebase the default at a major
 version bump and force all integrators and other people who are
 happy with how pull = fetch + merge (not fetch + rebase) works
 to say pull.rebase = false in their configuration, I think I can
 see why some people may think it makes sense, though.

 But neither is an easy sell, I would imagine.  It is not about
 passing me, but about not hurting users like kernel folks we
 accumulated over 7-8 years.

It would be a *horrible* mistake to make rebase the default, because
it's so much easier to screw things up that way.

That said, making no-ff the default, and then if that fails, saying

   The pull was not a fast-forward pull, please say if you want to
merge or rebase.
   Use either

git pull --rebase
git pull --merge

   You can also use git config pull.merge true or git config
pull.rebase true
   to set this once for this project and forget about it.

That way, people who want the existing behavior could just do that

git config pull.merge true

once, and they'd not even notice.

Hmm? Better yet, make it per-branch.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: first parent, commit graph layout, and pull merge direction

2013-05-23 Thread Linus Torvalds
On Thu, May 23, 2013 at 5:21 PM, Junio C Hamano gits...@pobox.com wrote:

 I would assume that no-ff above was meant to be --ff-only from
 the first part of the message.

Yeah, I may need more coffee..

 I also would assume that I can rephrase that setting pull.merge
 (which does not exist) as setting pull.rebase explicitly to false
 instead (i.e. missing pull.rebase and pull.rebase that is explicitly
 set to false would mean two different things).

Yeah, sounds good to me, and doesn't really sound like it would
confuse/annoy anybody as long as it was clearly documented.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New feature discussion: git rebase --status

2013-06-11 Thread Linus Torvalds
On Tue, Jun 11, 2013 at 10:18 AM, Hilco Wijbenga
hilco.wijbe...@gmail.com wrote:

 Having git status display (even more) context sensitive
 information during git rebase or git merge would be very welcome.
 Please, if at all possible, don't make that a separate command.

I agree. The rebase state etc is something that would be much better
in git status output, and would avoid having people learn about
another new flag to random commands.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] build: get rid of the notion of a git library

2013-06-11 Thread Linus Torvalds
On Tue, Jun 11, 2013 at 11:06 AM, Felipe Contreras
felipe.contre...@gmail.com wrote:

 Moreover, if you are going to argue that we shouldn't be closing the
 door [...]

Felipe, you saying if you are going to argue ... to anybody else is
kind of ironic.

Why is it every thread I see you in, you're being a dick and arguing
for some theoretical thing that nobody else cares about?

This whole thread has been one long argument about totally pointless
things that wouldn't improve anything one way or the other. It's
bikeshedding of the worst kind. Just let it go.

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace

2012-11-13 Thread Linus Torvalds
On Tue, Nov 13, 2012 at 11:15 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:

 Linus, do you remember if you left them out on purpose?

Umm, no.

I have to wonder why you care? As far as I'm concerned, the only valid
space is space, TAB and CR/LF.

Anything else is *noise*, not space. What's the reason for even caring?

  Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace

2012-11-13 Thread Linus Torvalds
On Tue, Nov 13, 2012 at 11:40 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 I have to wonder why you care? As far as I'm concerned, the only valid
 space is space, TAB and CR/LF.

 Anything else is *noise*, not space. What's the reason for even caring?

Btw, expanding the whitespace selection may actually be very
counter-productive. It is used primarily for things like removing
extraneous space at the end of lines etc, and for that, the current
selection of SPACE, TAB and LF/CR is the right thing to do.

Adding things like FF etc - that are *technically* whitespace, but
aren't the normal kind of silent whitespace - is potentially going to
change things too much. People might *want* a form-feed in their
messages, for all we know.

So I really object to changing things just because. There's a reason
we do our own ctype.c: it avoids the crazy crap. It avoids the idiotic
localization issues, and it avoids the ambiguous cases.

So just let it be, unless you have some major real reason to actually
care about a real-world case. And if you do, please explain it. Don't
change things just because.

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fix clang -Wtautological-compare with unsigned enum

2013-01-17 Thread Linus Torvalds
On Thu, Jan 17, 2013 at 3:00 AM, John Keeping j...@keeping.me.uk wrote:

 There's also a warning that triggers with clang 3.2 but not clang trunk, which
 I think is a legitimate warning - perhaps someone who understands integer type
 promotion better than me can explain why the code is OK (patch-score is
 declared as 'int'):

 builtin/apply.c:1044:47: warning: comparison of constant 18446744073709551615
 with expression of type 'int' is always false
 [-Wtautological-constant-out-of-range-compare]
 if ((patch-score = strtoul(line, NULL, 10)) == ULONG_MAX)
  ^  ~

The warning seems to be very very wrong, and implies that clang has
some nasty bug in it.

Since patch-score is 'int', and UNLONG_MAX is 'unsigned long', the
conversion rules for the comparison is that the int result from the
assignment is cast to unsigned long. And if you cast (int)-1 to
unsigned long, you *do* get ULONG_MAX. That's true regardless of
whether long has the same number of bits as int or is bigger. The
implicit cast will be done as a sign-extension (unsigned long is not
signed, but the source type of 'int' *is* signed, and that is what
determines the sign extension on casting).

So the is always false is pure and utter crap. clang is wrong, and
it is wrong in a way that implies that it actually generates incorrect
code. It may well be worth making a clang bug report about this.

That said, clang is certainly understandably confused. The code
depends on subtle conversion rules and bit patterns, and is clearly
very confusingly written.

So it would probably be good to rewrite it as

unsigned long val = strtoul(line, NULL, 10);
if (val == ULONG_MAX) ..
patch-score = val;

instead. At which point you might as well make the comparison be =
INT_MAX instead, since anything bigger than that is going to be
bogus.

So the git code is probably worth cleaning up, but for git it would be
a cleanup. For clang, this implies a major bug and bad code
generation.

   Linus
 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fix clang -Wtautological-compare with unsigned enum

2013-01-18 Thread Linus Torvalds
On Fri, Jan 18, 2013 at 9:15 AM, Phil Hord phil.h...@gmail.com wrote:

 Yes, I can tell by the wording of the error message that you are right
 and clang has a problem.  But the git code it complained about does
 have a real problem, because the result of signed int a = ULONG_MAX
 is implementation-defined.

Only theoretically.

Git won't work on machines that don't have 8-bit bytes anyway, so
worrying about the theoretical crazy architectures that aren't two's
complement etc isn't something I'd care about.

There's a whole class of technically implementation-defined issues
in C that simply aren't worth caring for. Yes, the standard is written
so that it works on machines that aren't byte-addressable, or EBCDIC
or have things like 18-bit words and 36-bit longwords. Or 16-bit int
for microcontrollers etc.

That doesn't make those implementation-defined issues worth worrying
about these days. A compiler writer could in theory make up some
idiotic rules that are still valid by the C standard even on modern
machines, but such a compiler should simply not be used, and the
compiler writer in question should be called out for being an ass-hat.

Paper standards are only worth so much. And that so much really
isn't very much.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] Module fixes, and a virtio block fix.

2013-01-20 Thread Linus Torvalds
On Sun, Jan 20, 2013 at 5:32 PM, Rusty Russell ru...@rustcorp.com.au wrote:

 Due to the delay on git.kernel.org, git request-pull fails.  It *looks*
 like it succeeds, except the warning, but (as we learned last time I
 screwed up), it doesn't put the branchname because it can't know.

I think this should be fixed in modern git versions.

And it sure as hell knows the proper tag name, since you *gave* it the
name and it used it for generating the actual contents. The fact that
some versions then screw that up and re-write the tag-name to
something randomly matching that isn't a tag was just a bug.

 For want of a better solution, I'll now resort to sending pull requests
 with the anti-social gitolite URL in it, like so:

That's even worse, fwiw. It means that the pull request address makes
no sense to anybody who doesn't have a kernel.org address, and then
I'm forced to just edit things by hand instead to not pollute the
kernel changelog history with crap.

Junio, didn't git request-pull get fixed so that it *warns* about
missing tagnames/branches, but never actually corrupts the pull
request? Or did it just get fixed to be a hard error instead of
corrupting things? Because this is annoying.

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] Module fixes, and a virtio block fix.

2013-01-20 Thread Linus Torvalds
On Sun, Jan 20, 2013 at 6:00 PM, Junio C Hamano gits...@pobox.com wrote:

 What you mean by corrupt is not clear to me

Some versions would just silently change the actual name you were using.

So if you said for-linus, it might change it to linus, just
because that branch happened to have the same SHA1 commit ID.

That's not right.

Other versions would replace the for-linus with **missing-branch**
because for-linus hadn't mirrored out yet.

That's not right either.

Basically, if git request-pull is given a branch/tag name, that is
the only valid output (although going from branch-tag *might* be
acceptable). The whole verify that it actually exists on the remote
side must never *ever* actually change the message itself, it should
just cause a warning outside of the message.

I can't say from the commit message whether that's the thing that
fixed it or not, but at least some people stopped sending me broken
pull requests after updating to git. I'm just not sure which of the
two different failure cases they happened to have (Rusty seems to have
hit both)

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] Module fixes, and a virtio block fix.

2013-01-20 Thread Linus Torvalds
On Sun, Jan 20, 2013 at 6:57 PM, Rusty Russell ru...@rustcorp.com.au wrote:

 I'm confused.  The default argument is HEAD: what does it know about tag
 names?

Ugh. I actually thought that if you give it the tag name directly (as
the end) it will use that.

But no. It figures it out with git describe --exact internally.
Regardless, if your HEAD is actually tagged, it *will* have the
tag-name in git-request-pull.

And it will have it based on your *local* repo, so the fact that it
hasn't been mirrored out yet doesn't really matter. git request-pull
knows that tag name regardless of mirroring issues.

 The bug is that if it can't find that commit at the remote end, it
 still generates a valid-looking request (with a warning at the end),
 where it guesses you're talking about the master branch.

It really shouldn't do that any more, but you seem to have the older
version with the bug.

At  least one of the annoying problems was fixed in the 1.7.11 series,
you have 1.7.10.

The nice thing about git is that it is *really* easy to upgrade. Just
fetch the sources, do make; make install all as a normal user, and
you do not need to worry about package management or distro issues or
any crap like that. It installs into your $(HOME)/bin, and as long as
your PATH has that first, you'll get it. I've long suggested that as
the workaround for distros having old versions (some more so than
others).

 Since I use a wrapper script now for your pull requests I can use sed to
 unscrew it:

 [alias]
 for-linus = !check-commits  TAGNAME=`git symbolic-ref HEAD | cut 
 -d/ -f3`-for-linus  git tag -f -u D1ADB8F1 $TAGNAME HEAD  git push korg 
 tag $TAGNAME  git request-pull master korg | sed 
 s,gitol...@ra.kernel.org:/pub,git://git.kernel.org/pub,  git log --stat 
 --reverse master..$TAGNAME | emails-from-log | grep -v 'rusty@rustcorp' | 
 grep -v 'sta...@kernel.org' | sed 's/^/Cc: /'

Heh. Ok. That will at least hide the breakage. But I suspect you could
fix it by just updating git.

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: unneeded merge in the security tree

2013-03-12 Thread Linus Torvalds
[ Added Junio and git to the recipients, and leaving a lot of stuff
quoted due to that... ]

On Mon, Mar 11, 2013 at 9:16 PM, Theodore Ts'o ty...@mit.edu wrote:
 On Tue, Mar 12, 2013 at 03:10:53PM +1100, James Morris wrote:
 On Tue, 12 Mar 2013, Stephen Rothwell wrote:
  The top commit in the security tree today is a merge of v3.9-rc2.  This
  is a completely unnecessary merge as the tree before the merge was a
  subset of v3.9-rc1 and so if the merge had been done using anything but
  the tag, it would have just been a fast forward.  I know that this is now
  deliberate behaviour on git's behalf, but isn't there some way we can
  make this easier on maintainers who are just really just trying to pick a
  new starting point for their trees after a release?  (at least I assume
  that is what James was trying to do)

 Yes, and I was merging to a tag as required by Linus.

Now, quite frankly, I'd prefer people not merge -rc tags either, just
real releases. -rc tags are certainly *much* better than merging
random daily stuff, but the basic rule should be don't back-merge AT
ALL rather than back-merge tags.

That said, you didn't really want a merge at all, you just wanted to
sync up and start development. Which is different (but should still
prefer real releases, and only use rc tags if it's fixing stuff that
happened in the merge window - which may be the case here).

 Why not just force the head of the security tree to be v3.9-rc2?  Then
 you don't end up creating a completely unnecessary merge commit, and
 users who were at the previous head of the security tree will
 experience a fast forward when they pull your new head.

So I think that may *technically* be the right solution, but it's a
rather annoying UI issue, partly because you can't just do it in a
single operation (you can't do a pull of the tag to both fetch and
fast-forward it), but partly because git reset --hard is also an
operation that can lose history, so it's something that people should
be nervous about, and shouldn't use as some kind of standard let's
just fast-forward to Linus' tree thing.

At the same time, it's absolutely true that when *I* pull a signed tag
from a downstream developer, I don't want a fast-forward, because then
I'd lose the signature. So when a maintainer pulls a submaintainer
tree, you want the signature to come upstream, but when a
submaintainer wants to just sync up with upstream, you don't want to
generate the pointless signed merge commit, because the signature is
already upstream because it's a public tag. So gthe behavior of git
pull is fundamentally ambiguous.

But git doesn't know the difference between official public upstream
tag and signed tag used to verify the pull request.

I'm adding the git list just to get this issue out there and see if
people have any ideas. I've got a couple of workarounds, but they
aren't wonderful..

One is simple:

git config alias.sync=pull --ff-only

which works fine, but forces submaintainers to be careful when doing
things like this, and using a special command to do back-merges.

And maybe that's the right thing to do? Back-merges *are* special,
after all. But the above alias is particularly fragile, in that
there's both pull and merge that people want to use this for, and
it doesn't really handle both. And --ff-only will obviously fail if
you actually have some work in your tree, and want to do a real merge,
so then you have to do that differently. So I'm mentioning this as a
better model than git reset, but not really a *solution*.

That said, the fact that --ff-only errors out if you have local
development may actually be a big bonus - because you really shouldn't
do merges at all if you have local development, but syncing up to my
tree if you don't have it (and are going to start it) may be something
reasonable.

Now, the other approach - and perhaps preferable, but requiring actual
changes to git itself - is to do the non-fast-forward merge *only* for
FETCH_HEAD, which already has magic semantics in other ways. So if
somebody does

git fetch linus
git merge v3.8

to sync with me, they would *not* get a merge commit with a signature,
just a fast-forward. But if you do

git pull linus v3.8

or a

git fetch linus v3.8
git merge FETCH_HEAD

it would look like a maintainer merge and stash the signature in the
merge commit rather than fast-forward. It would probably work in
practice.

The final approach might be to make it like the merge summary and
simply make it configurable _and_ have a command line flag for it,
defaulting to our current behavior or to the above suggested default
on for FETCH_HEAD, off for anything else.

Hmm?

Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: unneeded merge in the security tree

2013-03-12 Thread Linus Torvalds
On Tue, Mar 12, 2013 at 2:20 PM, Theodore Ts'o ty...@mit.edu wrote:
 What if we added the ability to do something like this:

 [remote origin]
 url = 
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
 fetch = +refs/heads/master:refs/heads/master
 mergeoptions = --ff-only

Hmm. Something like this could be interesting for other things:

 - use --rebase when pulling (this is common for people who maintain
a set of patches and do *not* export their git tree - I use it for
projects like git and subsurface where there is an upstream maintainer
and I usually send patches by email rather than git)

 - --no-summary. As a maintainer, you people probably do want to
enable summaries for people they pull from, but *not* from upstream.
So this might even make sense to do by default when you clone a new
repository.

 - I do think that we might want a --no-signatures for the specific
case of merging signed tags without actually taking the signature
(because it's a upstream repo). The --ff-only thing is *too*
strict. Sometimes you really do want to merge in new code, disallowing
it entirely is tough.

Of course, I'm not really sure if we want to list the flags. Maybe
it's better to just introduce the notion of upstream directly, and
make that a flag, and make origin default to that when you clone.
And then have git use different heurstics for pulling upstream (like
warning by default when doing a back-merge, perhaps?)

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: unneeded merge in the security tree

2013-03-12 Thread Linus Torvalds
On Tue, Mar 12, 2013 at 2:47 PM, Junio C Hamano gits...@pobox.com wrote:

 I agree that --ff-only thing is too strict and sometimes you would
 want to allow back-merges, but when you do allow such a back-merge,
 is there a reason you want it to be --no-signatures merge?  When a
 subtree maintainer decides to merge a stable release point from you
 with a good reason, I do not see anything wrong in recording that
 the resulting commit _did_ merge what you released with a signature.

No, there's nothing really bad with adding the signature to the merge
commit if you do make a merge. It's the fact that it currently makes a
non-ff merge when that is pointless that hurts.

That said, adding the signature from an upstream tag doesn't really
seem to be hugely useful. I'm not seeing much of an upside, in other
words. I'd *expect* that people would pick up upstream tags
regardless, no?

   Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git status takes 30 seconds on Windows 7. Why?

2013-03-27 Thread Linus Torvalds
On Wed, Mar 27, 2013 at 12:04 PM, Jeff King p...@peff.net wrote:

 Yes, I think that's pretty much the case (though most of my
 Git-on-Windows experience is from cygwin long ago, where the stat
 performance was truly horrendous). Have you tried setting
 core.preloadindex, which should run the stats in parallel?

I wonder if preloadindex shouldn't be enabled by default.. It's a huge
deal on NFS, and the only real downside is that it expects threading
to work. It potentially slows things down a tiny bit for single-CPU
cases with everything cached, but that isn't likely to be a relevant
case.

Of course, it can trigger filesystem scalability issues, and as a
result it will often not help very much if you have the bulk of your
files in one (or a few) directories. But anybody who has so many files
that performance is an issue is not likely to have them all in one
place.

And apparently the Windows FS metadata caching sucks, and things fall
out of the cache for large trees. Color me not-very-surprised. It's
probably some size limit on the metadata that you can tweak. So I';m
sure there's some registry setting or other that would make windows
able to cache more than a few thousand filenames, and it would
probably improve performance a lot, but I do think preloadindex has
been around long enough that it could just be the default.

Of course, Jim should verify that preloadindex actually does solve his
problem.  With 20k+ files, it should max out the 20 IO threads for
preloading, and assuming the filesystem IO scales reasonably well, it
should fix the problem. But we do do a number of metadata ops
synchronously even with preloadindex, so things won't scale perfectly.

(In particular: do open each directory and do the readdir stuff and
try to open .gitignore whether it exists or not. So you'll get
synchronous IO for each directory, but at least the per-file IO to
check all the file stat data should scale).

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git status takes 30 seconds on Windows 7. Why?

2013-03-27 Thread Linus Torvalds
On Wed, Mar 27, 2013 at 1:00 PM, Junio C Hamano gits...@pobox.com wrote:

 Given that we haven't tweaked the parallelism or thread-cost
 parameters since the inception of the mechanism in Nov 2008, I
 suspect that we would see praises from some and grievances from
 other corners of the user base for a while until we find acceptable
 values for them

Looking at the parameters again, I really think they are pretty sane,
and I don't think the numbers are all that likely to have shifted from
2008. The maximum thread value is quite reasonable: twenty threads is
sufficient to cover quite a bit of latency, and brings several
seconds down to under half a second for any truly IO-limited load,
while not being disastrous for the case where everything is in cache
and we only have a limited number of CPU cores.

And the at least 500 files per thread limit is eminently reasonable
too - smaller projects like git won't have more than five or so
threads.

So I'd be very surprised if the values need much tweaking. Sure, there
might be some extreme cases that might tune for some particular
patterns, and maybe we should make the values be tunable rather than
totally hardcoded, but I suspect there's limited up-side.

It might be interesting for the people who really like tuning, though.
So in addition to index.preload=true, maybe an extended config
format like index_preload=50,200 to say maximum of fifty threads,
for every 200 files could be done just so people could play around
with the numbers and see how much (if at all) they actually matter.

But I really don't think the original 20/500 rule is likely to be all
that bad for anybody. Unless there is some *really* sucky thread
library out there (ie fully user-space threads, so filename lookup
isn't actually parallelised at all), but at least for that case the
fix is to just say ok, your threads aren't real threads, so just
disable index preloading entirely).

 Linus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Linus Torvalds


On Thu, 14 Apr 2005, David Woodhouse wrote:

 I've been looking at tracking file revisions. One proposed solution was
 to have a separate revision history for individual files, with a new
 kind of 'filecommit' object which parallels the existing 'commit',
 referencing a blob instead of a tree. Then trees would reference such
 objects instead of referencing blobs directly.

Please don't.  It's fundamentally the git notion of content determines
objects.

It also has no relevance. A rename really doesn't exist in the git 
model. The git model really is about tracking data, not about tracking 
what happened to _create_ that data.

The one exception is the commit log. That's where you put the explanations 
of _why_ the data changed. And git itself doesn't care what the format is, 
apart from the git header.

So, you really need to think of git as a filesystem. You can then 
implement an SCM _on_top_of_it_, which means that your second suggestion 
is not only acceptable, it really is the _only_ way to handle this in git:

 So a commit involving a rename would look something like this...
 
   tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
   parent bb95843a5a0f397270819462812735ee29796fb4
   rename foo.c bar.c
   author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
   Rename foo.c to bar.c and s/foo_/bar_/g

Except I want that empty line in there, and I want it in the free-form  
section. The rename part really isn't part of the git header. It's not 
what git tracks, it was tracked by an SCM system on top of git.

So the git header is an inode in the git filesystem, and like an inode 
it has a ctime and an mtime, and pointers to the data. So as far as git is 
concerned, this part:

tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
parent bb95843a5a0f397270819462812735ee29796fb4
author David Woodhouse [EMAIL PROTECTED] 1113499881 +0100
committer David Woodhouse [EMAIL PROTECTED] 1113499881 +0100

really is the filesystem inode. The rest is whatever the filesystem user
puts into it, and git won't care.

 Opinions? Dissent? We'd probably need to escape the filenames in some
 way -- handwave over that for now.

The fact that git handles arbitrary filenames (stuff starting with . 
excepted) doesn't mean that the SCM above it needs to. Quite frankly, I 
think an SCM that handles newlines in filenames is being silly. But a 
_filesystem_ needs to not care.

There are too many messy SCM's out there that do not hav ea philosophy. 
Dammit, I'm not interested in creating another one. This thing has a 
mental model, and we keep to that model.

The reason UNIX is beautiful is that it has a mental model of processes 
and files. Git has a mental model of objects and certain very very limited 
relationships. The relationships git cares about are encoded in the C 
files, the extra crap (like rename info) is just that - stuff that 
random scripts wrote, and that is just informational and not central to 
the model.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: write-tree is pasky-0.4

2005-04-15 Thread Linus Torvalds


On Fri, 15 Apr 2005, Daniel Barkalow wrote:
 
 Is there some reason you don't commit before merging? All of the current
 merge theory seems to want to merge two commits, using the information git
 keeps about them.

Note that the 3-way merge would _only_ merge the committed state. The 
thing is, 99% of all merges end up touching files that I never touch 
myself (ie other architectures), so me being able to merge them even when 
_I_ am in the middle of something is a good thing.

So even when I have dirty state, the merge would only merge the clean
state. And then before the merge information is put back into my working
directory, I'd do a check-files on the result, making sure that nothing
that got changed by the merge isn't up-to-date.

 How much do you care about the situation where there is no best common
 ancestor

I care. Even if the best common parent is 3 months ago, I care. I'd much 
rather get a big explicit conflict than a clean merge that ends up being 
debatable because people played games with per-file merging or something 
questionable like that.

 I think that the time spent on I/O will be overwhelmed by the time spent
 issuing the command at that rate.

There is no time at all spent on IO.

All my email is local, and if this all ends up working out well, I can 
track the other peoples object trees in local subdirectories with some 
daily rsyncs. And I have enough memory in my machines that there is 
basically no disk IO - the only tree I normally touch is the kernel trees, 
they all stay in cache.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Add clone support to lntree

2005-04-15 Thread Linus Torvalds


On Sat, 16 Apr 2005, Petr Baudis wrote:
 
 I'm wondering, whether each tree should be fixed to a certain branch.

I'm wondering why you talk about branches at all.

No such thing should exist. There are no branches. There are just 
repositories. You can track somebody elses repository, but you should 
track it by location, not by any branch name.

And you track it by just merging it.

Yeah, we don't have really usable merges yet, but..

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-15 Thread Linus Torvalds


On Fri, 15 Apr 2005, Junio C Hamano wrote:
 
 I'd take the hint, but I would say the current Perl version
 would be far more usable than the C version I would come up with
 by the end of this weekend because:

Actually, it turns out that I have a cunning plan.

I'm full of cunning plans, in fact. It turns out that I can do merges even
more simply, if I just allow the notion of state into an index entry,
and allow multiple index entries with the same name as long as they differ
in state.

And that means that I can do all the merging in the regular index tree, 
using very simple rules.

Let's see how that works out. I'm writing the code now.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:
 
 LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial 
 LT merges, but I ended up deciding that only the matches in all three 
 LT states thing collapses by default.
 
  * Understood and agreed.

Having slept on it, I think I'll merge all the trivial cases that don't 
involve a file going away or being added. Ie if the file is in all three 
trees, but it's the same in two of them, we know what to do.

That way we'll leave thigns where the tree itself changed (files added or 
removed at any point) and/or cases where you actually need a 3-way merge.

 The userland merge policies need ways to extract the stage
 information and manipulate them.  Am I correct to say that you
 mean by ls-files -l the extracting part?

No, I meant show-files, since we need to show the index, not a tree (no 
valid tree can ever have the modes information, since (a) it doesn't 
have the space for it anyway and (b) we refuse to write out a dirty index 
file.



 
 LT I should make ls-files have a -l format, which shows the
 LT index and the mode for each file too.
 
 You probably meant ls-tree.  You used the word mode but it
 already shows the mode so I take it to mean stage.  Perhaps
 something like this?
 
 $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b
 100644blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING
 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:1  
 man/frotz.6
 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:2  
 man/frotz.6
 100664blobeeed997e557fb079f38961354473113ca0d0b115:3  
 man/frotz.6

Apart from the fact that it would be

show-files -l

since there are no tree objects that can have anything but fully merged
state, yes.

 Assuming that you would be working on that, I'd like to take the
 dircache manipulation part.  Let's think about the minimally
 necessary set of operations:
 
  * The merge policy decides to take one of the existing stage.
 
In this case we need a way to register a known mode/sha1 at a
path.  We already have this as update-cache --cacheinfo.
We just need to make sure that when update-cache puts
things at stage 0 it clears other stages as well.
 
  * The merge policy comes up with a desired blob somewhere on
the filesystem (perhaps by running an external merge
program).  It wants to register it as the result of the
merge.
 
We could do this today by first storing the desired blob
in a temporary file somewhere in the path the dircache
controls, update-cache --add the temporary file, ls-tree to
find its mode/sha1, update-cache --remove the temporary
file and finally update-cache --cacheinfo the mode/sha1.
This is workable but clumsy.  How about:
 
$ update-cache --graft [--add] desired-blob path
 
to say I want to register mode/sha1 from desired-blob, which
may not be of verify_path() satisfying name, at path in the
dircache?
 
  * The merge policy decides to delete the path.
 
We could do this today by first stashing away the file at the
path if it exists, update-cache --remove it, and restore
if necessary.  This is again workable but clumsy.  How about:
 
$ update-cache --force-remove path
 
to mean I want to remove the path from dircache even though
it may exist in my working tree?

Yes.

 Am I on the right track?

Exactly.

 You might want to go even lower level by letting them say
 something like:
 
  * update-cache --register-stage mode sha1 stage path
 
Registers the mode/sha1 at stage for path.  Does not look at
the working tree.  stage is [0-3]

I'd prefer not. I'd avoid playing games with the stages at any other level
than the full tree level until we show a real need for it.

Let's go with the known-needed minimal cases that are high-level enough to
make the scripting simple, and see if there is any reason to ever touch
the tree any other way.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Linus Torvalds wrote:
 
 Having slept on it, I think I'll merge all the trivial cases that don't 
 involve a file going away or being added. Ie if the file is in all three 
 trees, but it's the same in two of them, we know what to do.

Junio, I pushed this out, along with the two patches from you. It's still
more anal than my original tree-diff algorithm, in that it refuses to
touch anything where the name isn't the same in all three versions
(original, new1 and new2), but now it does the if two of them match, just
select the result directly trivial merges.

I really cannot see any sane case where user policy might dictate doing
anything else, but if somebody can come up with an argument for a merge
algorithm that wouldn't do what that trivial merge does, we can make a
flag for don't merge at all.

The reason I do want to merge at all in read-tree is that I want to
avoid having to write out a huge index-file (it's 1.6MB on the kernel, so
if you don't do _any_ trivial merges, it would be 4.8MB after reading
three trees) and then having people read it and parse it just to do stuff
that is obvious. Touching 5MB of data isn't cheap, even if you don't do a 
whole lot to it.

Anyway, with the modified read-tree, as far as I can tell it will now 
merge all the cases where one side has done something to a file, and the 
other side has left it alone (or where both sides have done the exact same 
modification). That should _really_ cut down the cases to just a few files 
for most of the kernel merges I can think of. 

Does it do the right thing for your tests?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Ingo Molnar wrote:
 
 i've converted the Linux kernel CVS tree into 'flat patchset' format, 
 which gave a series of 28237 separate patches. (Each patch represents a 
 changeset, in the order they were applied. I've used the cvsps utility.)
 
 the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
 script that will apply all the patches in order and will create a 
 pristine 2.6.12-rc2 tree.

Hey, that's great. I got the CVS repo too, and I was looking at it, but 
the more I looked at it, the more I felt that the main reason I want to 
import it into git ends up being to validate that my size estimates are at 
all realistic.

I see that Thomas Gleixner seems to have done that already, and come to a 
figure of 3.2GB for the last three years, which I'm very happy with, 
mainly because it seems to match my estimates to a tee. Which means that I 
just feel that much more confident about git actually being able to handle 
the kernel long-term, and not just as a stop-gap measure.

But I wonder if we actually want to actually populate the whole history.. 
Now that my size estimates have been verified, I have little actual real 
reason to put the history into git. There are no visualization tools done 
for git yet, and no helpers to actually find problems, and by the time 
there will be, we'll have new history.

So I'd _almost_ suggest just starting from a clean slate after all.  
Keeping the old history around, of course, but not necessarily putting it
into git now. It would just force everybody who is getting used to git in 
the first place to work with a 3GB archive from day one, rather than 
getting into it a bit more gradually.

What do people think? I'm not so much worried about the data itself: the
git architecture is _so_ damn simple that now that the size estimate has
been confirmed, that I don't think it would be a problem per se to put
3.2GB into the archive. But it will bog down rsync horribly, so it will
actually hurt synchronization untill somebody writes the rev-tree-like
stuff to communicate changes more efficiently..

IOW, it smells to me like we don't have the infrastructure to really work 
with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can 
build up the infrastructure in parallell with starting to really need it.

But it's _great_ to have the history in this format, especially since 
looking at CVS just reminded me how much I hated it.

Comments?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 One remark on the tree blob storage format. 
 The binary storage of the sha1sum of the refered object is a PITA for
 scripting. 
 Converting the ASCII - binary for the sha1sum comparision should not
 take much longer than the binary - ASCII conversion for the file
 reference. Can this be changed ?

I'd really rather not. Why don't you just use ls-tree for scripting? 
That's why it exists in the first place. 

It might make sense to have some simple selection capabilities built into 
ls-tree (ie ls-tree --match drivers/char/ -z treesha1 to get just a 
subtree out), but that depends entirely on how you end up using it.

The fact is, there should _never_ any reason to look at the objects
themselves directly. cat-file is a debugging aid, it shouldn't be
scripted (with the possible exception of cat-file blob  to just
extract the blob contents, since that object doesn't have any internal
structure).

That level of abstraction (we never look directly at the objects) is 
what allows us to change the object structure later. For example, we 
already changed the commit date thing once, and the tree object has 
obviously evolved a bit, and if we ever change the hash, the objects will 
change too, but if you always just script them using nice helper tools, 
you won't ever need to _care_. And that's how it should be.

If there's a tool missing, holler. THAT is the part I've been trying to
write: all the plumbing so that you _can_ script the thing sanely, and not
worry about how objects are created and worked with. 

For example, that index file format likely _will_ change. I ended up
doing the new stage flags in a way that kept the index file compatible
with old ones, but I did that mainly because it also happened to be the
easiest way to enforce the rule I wanted to enforce (ie the stage really
_is_ a part of the filename from a compare filenames standpoint, in
order to make sure that the stages are always ordered).

So if the index file change hadn't had that property, I'd have just said
I'll change the format, and anybody who tried to parse the index file
would have been _broken_.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 For the export stuff its terrible slow. :(

I don't really see your point.

If you already know what the tree is like you say, you don't care about
the tree object. And if you don't know what the tree is, what _are_ you
doing?

In other words, show us what you're complaining about. If you're looking
into the trees yourself, then the binary representation of the sha1 is
already what you want. That _is_ the hash. So why do you want it in ASCII?  
And if you're not looking into the tree directly, but using cat-file
tree and you were hoping to see ASCII data, then that's certainly not
going to be any faster than just doing ls-tree instead.

In other words, I don't see your point. Either you want ascii output for 
scripting, or you don't. First you claimed that you did, and that you 
would want the tree object to change in order to do so. Now you claim that 
you can't use ls-tree because it's too slow. 

That just isn't making any sense. You're mixing two totally different
levels, and complaining about performance when scripting things. Yet
you're talking about a 20-byte data structure that is trivial to convert
to any format you want.

What kind of _strange_ scripting architecture is so fast that there's a
difference between cat-file and ls-tree and can handle 17,000 files in
60,000 revisions, yet so slow that you can't trivially convert 20 bytes of 
data?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update-cache --refresh cache entry leak

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:

 When update-cache --refresh replaces an existing cache entry
 with a new one, it forgets to free the original.

I've seen this patch now three times, and it's been wrong every single 
time. Maybe we should add a comment?

That active-cache entry you free()'d was not necessarily allocated with 
malloc(). Most cache-entries are just mmap'ed directly from the index 
file.

Leaking is ok. We cannot leak too much.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Paul Jackson wrote:

 Daniel wrote:
  I'm working off of Linus's tree when not working on scripts, and it
  doesn't have that section at all.
 
 Ah so - nevermind my README comments then.

Well, actually, I suspect that something like this should go to Pasky. I
really see my repo as purely a internal git datastructures, and when it
gets to how do we interact with other peoples web-sites, I suspect 
Pasky's tree is better.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Paul Jackson wrote:

 Morten wrote:
  It makes some sense in principle, but without storing what they mean
  (i.e., group==?) it certainly makes no sense. 
 
 There's no they there.
 
 I think Martin's proposal, to which I agreed, was to store a _single_
 bit.  If any of the execute permissions of the incoming file are set,
 then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
 bit is ON, then the file permission is set mode 0777 (modulo umask),
 else it is set mode 0666 (modulo umask).

I think I agree.

Anybody willing to send me a patch? One issue is that if done the obvious
way it's an incompatible change, and old tree objects won't be valid any
more. It might be ok to just change the compare cache check to only care
about a few bits, though: S_IXUSR and S_IFDIR. And then always write new 
tree objects out with mode set to one of
 - 04: we already do this for directories
 - 100644: normal files without S_IXUSR set
 - 100755: normal files _with_ S_IXUSR set

Then, at compare time, we only look at S_IXUSR matching for files (we
never compare directory modes anyway). And at file create time, we create
them with 0666 and 0777 respectively, and let the users umask sort it out
(and if the user has 0100 set in his umask, he can damn well blame
himself).

This would pretty much match the existing kernel tree, for example. We'd 
end up with some new trees there (and in git), but not a lot of 
incompatibility. And old trees would still work fine, they'd just get 
written out differently.

Anybody want to send a patch to do this?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with higher-order stages in dircache

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:
 
 I am wondering if you have a particular reason not to do the
 same for the removing half.

No. Except for me being silly.

Please just make it so.

 Also do you have any comments on this one from the same message?
 
  * read-tree
 
- When merging two trees, i.e. read-tree -m A B, shouldn't
  we collapse identical stage-1/2 into stage-0?

How do you actually intend to merge two trees? 

That sounds like a total special case, and better done with diff-tree.  
But regardless, since I assume the result is the later tree, why do a 
read-tree -m A B, since what you really want is read-tree B?

The real merge always needs the base tree, and I'd hate to complicate the 
real merge with some special-case that isn't relevant for that real case.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Linus Torvalds wrote:
 
 Anybody want to send a patch to do this?

Actually, I just did it. Seems to work for the only test-case I tried,
namely I just committed it, and checked that the permissions all ended up
being recorded as 0644 in the tree (if it has the -x bit set, they get
recorded as 0755).

When checking out, we always check out with 0666 or 0777, and just let 
umask do its thing. We only test bit 0100 when checking for differences.

Maybe I missed some case, but this does indeed seem saner than the try to 
restore all bits case. If somebody sees any problems, please holler.

(Btw, you may or may not need to blow away your index file by just 
re-creating it with a read-tree after you've updated to this. I _tried_ 
to make sure that the compare just ignored the ce_mode bits, but the fact 
is, your index file may be corrupt in the sense that it has permission 
sets that sparse expects to never generate in an index file any more..)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parseable commit header

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Stefan-W. Hahn wrote:
 
 after playing a while with git-pasky it is a crap to interpret the date of
 commit logs. Though it was a good idea to put the date in a parseable format
 (seconds since), but the format of the commit itself is not good parseable.

Actually, it is. The commit stuff removes all special characters from the 
strings, so '' and '' around the email do indeed act as delimiters, and 
cannot exist anywhere else.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, David A. Wheeler wrote:
 
 There's a minor reason to write out ALL the perm bit data, but
 only care about a few bits coming back in: Some people use
 SCM systems as a generalized backup system

Yes. I was actually thinking about having system config files in a git 
repository when I started it, since I noticed how nicely it would do 
exactly that.

However, since the mode bits also end up being part of the name of the 
tree object (ie they are most certainly part of the hash), it's really 
basically impossible to only care about one bit but writing out many bits: 
it's the same issue of having multiple identical blocks with different 
names.

It's ok if it happens occasionally (it _will_ happen at the point of a
tree conversion to the new format, for example), but it's not ok if it
happens all the time - which it would, since some people have umask 002
(and individual groups) and others have umask 022 (and shared groups), and
I can imagine that some anal people have umask 0077 (I don't want to play
with others).

The trees would constantly bounce between a million different combinations 
(since _some_ files would be checked out with the other mode).

At least if you always honor umask or always totally ignore umask, you get 
a nice repetable thing. We tried the always ignore umask thing, and the 
problem with that is that while _git_ ended up always doing a fchmod() 
to reset the whole permission mask, anybody who created files any other 
way and then checked them in would end up using umask.

One solution is to tell git with a command line flag and/or config file 
entry that for this repo, I want you to honor all bits. That should be 
easy enough to add at some point, and then you really get what you want.

That said, git won't be really good at doing system backup. I actually 
_do_ save a full 32-bit of mode (hey, you could have immutable bits 
etc set), but anybody who does anything fancy at all with mtime would be 
screwed, for example.

Also, right now we don't actually save any other type of file than
regular/directory, so you'd have to come up with a good save-format for
symlinks (easy, I guess - just make a link blob) and device nodes (that
one probably should be saved in the cache_entry  itself, possibly
encoded where the sha1 hash normally is).

Also, I made a design decision that git only cares about non-dotfiles. Git 
literally never sees or looks at _anything_ that starts with a .. I 
think that's absolutely the right thing to do for an SCM (if you hide your 
files, I really don't think you should expect the SCM to see it), but it's 
obviously not the right thing for a backup thing.

(It _might_ be the right thing for a system config file, though, eg 
tracking something like /etc with git might be ok, modulo the other 
issues).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using git directory cache code in darcs?

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, David Roundy wrote:
 
 That's all right.  Darcs would only access the cached data through a
 git-caching layer, and we've already got an abstraction layer over the
 pristine cache.  As long as the git layer can quickly retrieve the contents
 of a given file, we should be fine.

Yes.

In fact, one of my hopes was that other SCM's could just use the git
plumbing. But then I'd really suggest that you use git itself, not any
libgit. Ie you take _all_ the plumbing as real programs, and instead of
trying to link against individual routines, you'd _script_ it.

In other words, git would be an independent cache of the real SCM,
and/or the old history (ie an SCM that uses git could decide that the
git stuff is fine for archival, and really use git as the base: and then
the SCM could entirely concentrate on _only_ the interesting parts, ie
the actual merging etc).

That was really what I always personally saw git as, just the plumbing
beneath the surface. For example, something like arch, which is based on
patches and tar-balls (I think darcs is similar in that respect), could
use git as a _hell_ of a better history of tar-balls.

The thing is, unless you take the git object database approach, using 
_just_ the index part doesn't really mean all that much. Sure, you could 
just keep the current objects in the object database, but quite 
frankly, there would probably not be a whole lot of point to that. You'd 
waste so much time pruning and synchronizing with your real database 
that I suspect you'd be better off not using it.

(Or you could prune nightly or something, I guess).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Russell King wrote:

 On Sat, Apr 16, 2005 at 04:01:45PM -0700, Linus Torvalds wrote:
  So I re-created the dang thing (hey, it takes just a few minutes), and
  pushed it out, and there's now an archive on kernel.org in my public
  personal directory called linux-2.6.git. I'll continue the tradition
  of naming git-archive directories as *.git, since that really ends up
  being the .git directory for the checked-out thing.
 
 We need to work out how we're going to manage to get our git changes to
 you.  At the moment, I've very little idea how to do that.  Ideas?

To me, merging is my highest priority. I suspect that once I have a tree 
from you (or anybody else) that I actually _test_ merging with, I'll be 
motivated as hell to make sure that my plumbing actually works. 

After all, it's not just you who want to have to avoid the pain of 
merging: it's definitely in my own best interests to make merging as 
easy as possible. You're _the_ most obvious initial candidate, because 
your merges almost never have any conflicts at all, even on a file level 
(much less within a file).

 However, I've made a start to generate the necessary emails.  How about
 this format?
 
 I'm not keen on the tree, parent, author and committer objects appearing
 in this - they appear to clutter it up.  What're your thoughts?

Indeed. I'd almost drop the whole header except for the author line. 

Oh, and you need a separator between commits, right now your 
Signed-off-by: line ends up butting up with the header of the next 
commit ;)

 I'd rather not have the FQDN of the machine where the commit happened
 appearing in the logs.

That's fine. Out short-logs have always tried to have just the real name 
in them, and I do want an email-like thing for tracking the developer, but 
yes, if you remove the email, that's fine. It should be easy enough to do 
with a simple

sed 's/.*//'

or similar.

And if you replace author with From: and do the date conversion, it
might look more natural.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Russell King wrote:
 
 BTW, there appears to be errors in the history committed thus far.
 I'm not sure where this came from though.  Some of them could be
 UTF8 vs ASCII issues, but there's a number which seem to have extra
 random crap in them (^M) and lots of blank lines).

Ah, yes. That is actually from the original emails from Andrew. I do not 
know why, but I see them there. It's his script that does something 
strange.

(Andrew: in case you care, the first one is

[patch 003/198] arm: fix SIGBUS handling

which has the email looking like

...
From: [EMAIL PROTECTED]
Date: Tue, 12 Apr 2005 03:30:35 -0700
Status: 
X-Status: 
X-Keywords:   

^M)


From: Russell King [EMAIL PROTECTED]

ARM wasn't raising a SIGBUS with a siginfo structure.  Fix
__do_user_fault() to allow us to use it for SIGBUS conditions, and 
arrange
for the sigbus path to use this.
...

 One thing which definitely needs to be considered is - what character
 encoding are the comments to be stored as?

To git, it's just a byte stream, and you can have binary comments if you
want to. I personally would prefer to move towards UTF eventually, but I
really don't think it matters a whole lot as long as 99.9% of everything
we'd see there is still 7-bit ascii.

 ID: 75f86bac962b7609b0f3c21d25e10647ff8ed280
 [PATCH] intel8x0: AC'97 audio patch for Intel ESB2
  
 This patch adds the Intel ESB2 DID's to the intel8x0.c file for AC'97 
 audio
 support.
  
 Signed-off-by: A0Jason Gaston [EMAIL PROTECTED]

That A0 is also there in Andrew's original email. It's space with the
high bit set, and I have no idea why.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] fork optional branch point normazilation

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Brad Roberts wrote:

 (ok, author looks better, but committer doesn't obey the AUTHOR_ vars yet)

They should't, but maybe I should add COMMITTER_xxx overrides. I just do 
_not_ want people to think that they should claim to be somebody else: 
it's not a security issue (you could compile your own commit-tree.c 
after all), it's more of a social rule thing. I prefer seeing bad email 
addresses that at least match the system setup to seeing good email 
addresses that people made up just to make them look clean.

Mind showing what your /etc/passwd file looks like (just your own entry, 
and please just remove your password entry if you don't use shadow 
passwords).

Maybe I should just remove _all_ strange characters when I do the name 
cleanup in commit. Right now I just remove the ones that matter to 
parsing it unambiguosly: '\n' '' and ''.

(The ',' character really is special: some people have

Torvalds, Linus

and maybe I should not just remove the commas, I should convert it to 
always be Linus Torvalds. But your gecos entry is just _strange_. Why 
the extra commas, I wonder?)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] Parsing code in revision.h

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Daniel Barkalow wrote:

 --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 
 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
 +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 
 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
 @@ -24,6 +24,7 @@
   unsigned int flags;
   unsigned char sha1[20];
   unsigned long date;
 + unsigned char tree[20];
   struct parent *parent;
  };
  

I think this is really wrong.

The whole point of revision.h is that it's a generic framework for 
keeping track of relationships between different objects. And those 
objects are in no way just commit objects.

For example, fsck uses this struct revision to create a full free of 
_all_ the object dependencies, which means that a struct revision can be 
any object at all - it's not in any way limited to commit objects, and 
there is no tree object that is associated with these things at all.

Besides, why do you want the tree? There's really nothing you can do with 
the tree to a first approximation - you need to _first_ do the 
reachability analysis entirely on the commit dependencies, and then when 
you've selected a set of commits, you can just output those.

Later phases will indeed look up what the tree is, but that's only after
you've decided on the commit object. There's no point in looking up (or
even trying to just remember) _all_ the tree objects.

Hmm?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Russell King wrote:
 
 This will (and does) do exactly what I want.  I'll also read into the
 above a request that you want it in forward date order. 8)

No, I actually don't _think_ I care. In many ways I'm more used to
reverse date order, because that's usually how you view a changelog
(with a pager, and most recent changes at the top).

Which one makes sense when asking me to merge? I don't know, and I don't
think it really even matters, but maybe we can add a for now to whatever 
decision you end up coming to?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


First ever real kernel git merge!

2005-04-17 Thread Linus Torvalds

It may not be pretty, but it seems to have worked fine!

Here's my history log (with intermediate checking removed - I was being
pretty anal ;):

rsync -avz --ignore-existing 
master.kernel.org:/home/rmk/linux-2.6-rmk.git/ .git/
rsync -avz --ignore-existing 
master.kernel.org:/home/rmk/linux-2.6-rmk.git/HEAD .git/MERGE-HEAD
merge-base $(cat .git/HEAD) $(cat .git/MERGE-HEAD)
for i in e7905b2f22eb5d5308c9122b9c06c2d02473dd4f $(cat .git/HEAD) 
$(cat .git/MERGE-HEAD); do cat-file commit $i | head -1; done
read-tree -m cf9fd295d3048cd84c65d5e1a5a6b606bf4fddc6 
9c78e08d12ae8189f3bd5e03accc39e3f08e45c9 
a43c4447b2edc9fb01a6369f10c1165de4494c88
write-tree 
commit-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6 -p $(cat 
.git/HEAD) -p $(cat .git/MERGE-HEAD)
echo 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a  .git/HEAD 
fsck-cache --unreachable 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a

which looks really messy, because I really wanted to do each step slowly 
by hand, so those magic revision numbers are just cut-and-pasted from the 
results that all the previous stages had printed out.

NOTE! As expected, this merge had absolutely zero file-level clashes,
which is why I could just do the read-tree -m followed by a write-tree. 
But it's a real merge: I had some extra commits in my tree that were not
in Russell's tree, and obviously vice versa.

Also note! The end result is not actually written back to the corrent 
working directory, so to see what the merge result actually is, there's 
another final phase:

read-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6
update-cache --refresh
checkout-cache -f -a

which just updates the current working directory to the results. I'm _not_
caring about old dirty state for now - the theory was to get this thing
working first, and worry about making it nice to use later.

A second note: a real merge thing should notice that if the merge-base  
output ends up being one of the inputs (it one side is a strict subset of
the other side), then the merge itself should never be done, and the
script should just update directly to which-ever is non-common HEAD.

But as far as I can tell, this really did work out correctly and 100% 
according to plan. As a result, if you update to my current tree, the 
top-of-tree commit should be:

cat-file commit $(cat .git/HEAD)

tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6
parent 8173055926cdb8534fbaed517a792bd45aed8377
parent df4449813c900973841d0fa5a9e9bc7186956e1e
author Linus Torvalds [EMAIL PROTECTED] 111377 -0700
committer Linus Torvalds [EMAIL PROTECTED] 111377 -0700

Merge with master.kernel.org:/home/rmk/linux-2.6-rmk.git - ARM changes

First ever true git merge. Let's see if it actually works.

Yehaa! It did take basically zero time, btw. Except for my bunbling about,
and the first rsync the objects from rmk's directory part (which wasn't
horrible, it just wasn't instantaneous like the other phases).

Btw, to see the output, you really want to have a git log that sorts by 
date. I had an old gitlog.sh that did the old recursive thing, and while 
it shows the right thing, the ordering ended up making it be very 
non-obvious that rmk's changes had been added recently, since they ended 
up being at the very bottom.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Russell King wrote:
 
 I pulled it tonight into a pristine tree (which of course worked.)

Goodie.

 In doing so, I noticed that I'd messed up one of the commits - there's
 a missing new file.  Grr.  I'll put that down to being a newbie git.

Actually, you should put that down to horribly bad interface tools.  With
BK, we had these nice tools that pointed out that there were files that
you might want to commit (ie bk citool), and made this very obvious.

Tools absolutely matter. And it will take time for us to build up that 
kind of helper infrastructure. So being newbie might be part of it, but 
it's the smaller part, I say. Rough interfaces is a big issue.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] fork optional branch point normazilation

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, Brad Roberts wrote:

 braddr:x:1000:1000:Brad Roberts,,,:/home/braddr:/bin/bash
 
 All gecos entries on all my debian boxes are of the form:
 
fullname, office number, office extension, and home number

Ahh, ok.

I'll make the cleanup thing just remove strange characters from the end, 
that should fix this kind of thing for now.

I'd just remove everything after the first strange number, but I can also 
see people using the lastname, firstname format, and I'd hate to just 
ignore firstname in that case.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-17 Thread Linus Torvalds


On Mon, 18 Apr 2005, Herbert Xu wrote:
 
 I wasn't disputing that of course.  However, the same effect can be
 achieved in using a single hash with a bigger length, e.g., sha256
 or sha512.

No it cannot.

If somebody actually literally totally breaks that hash, length won't 
matter. There are (bad) hashes where you can literally edit the content of 
the file, and make sure that the end result has the same hash.

In that case, when the hash algorithm has actually been broken, the length 
of the hash ends up being not very relevant. 

For example, you might hash your file by blocking it up in 16-byte
blocks, and xoring all blocks together - the result is a 16-byte hash.  
It's a terrible hash, and obviously trivially breakable, and once broken
it does _not_ help to make it use its 32-byte cousin. Not at all. You can 
just modify the breaking thing to equally cheaply make modifications to a 
file and get the 32-byte hash right again.

Is that kind of breakage likely for sha1? Hell no. Is it possible? In your 
in theory world where practice doesn't matter, yes.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Russell King wrote:
 
 Ok, I just tried pulling your tree into the tree you pulled from, and
 got this:

No, that can't work. The pesky tools are helpful, but they really don't do 
merges worth cr*p right now, excuse my french. 

The _real_ way to pull is to do the (horribly complex) thing I described
by the merge, but noticing that one of the commits you are merging is a
proper subset of the other one, and just updating the head instead of
actually doing a real merge (ie skipping the read-tree -m and
write-tree phases).

 This was with some random version of git-pasky-0.04.  Unfortunately,
 this version doesn't have the sha1 ID appended, so I couldn't say
 definitively that it's the latest and greatest.  It might be a day
 old.

I'm afraid that until Pasky's tools script this properly, a pull really 
ends up being something like this (which _can_ be scripted, never fear):

NOTE NOTE NOTE! This is untested! I'm writing this within the email 
editor, so do _not_ do this on a tree that you care about.

#!/bin/sh
#
# use $1 or something in a real script, this 
# just hard-codes it.
#

merge_repo=master.kernel.org:/pub/linux/kernel/people/torvalds/linux-2.6.git

echo Getting object database
rsync -avz --ignore-existing $merge_repo/ .git/

echo Getting remote head
rsync -avz $merge_repo/HEAD .git/MERGE_HEAD

head=$(cat .git/HEAD)
merge_head=$(cat .git/MERGE-HEAD)
common=$(merge-base $head $merge_head)
if [ -z $common ]; then
echo Unable to find common commit between $merge_head $head
exit 1
fi

# Get the trees associated with those commits
common_tree=tree=$(cat-file commit $common | sed 's/tree //;q')
head_tree=tree=$(cat-file commit $head | sed 's/tree //;q')
merge_tree=tree=$(cat-file commit $merge | sed 's/tree //;q')

if [ $common == $merge_head ]; then
echo Already up-to-date. Yeeah!
exit 0
fi
if [ $common == $head ]; then
echo Updating from $head to $merge_head.
echo Destroying all noncommitted data!
echo Kill me within 3 seconds..
sleep 3
read-tree $merge_tree  checkout-cache -f -a
echo $merge_head  .git/HEAD
exit 0
fi
echo Trying to merge $merge_head into $head
read-tree -m $common_tree $head_tree $merge_tree
result_tree=$(write-tree) || exit 1
result_commit=$(echo Merge $merge_repo | commit-tree $result_tree -p 
$head -p $merge_head)
echo Committed merge $result_commit
echo $result_commit  .git/HEAD
read-tree $result_tree  checkout-cache -f -a

The above looks like it might work, but I also warn you: it's not only
untested, but it's pretty fragile in that if something breaks, you are
probably left with a mess. I _tried_ to do the right thing, but... So it
obviously will need testing, tweaking and just general tender loving care.

And if the merge isn't clean, it will exit early thanks to the

write-tree || exit 1

and now you have to resolve the merge yourself. There are tools to help
you do so automatically, but that's really a separate script.

You shouldn't hit the merge case at all right now, you should hit the 
Updating from $head to $merge_head thing.

If Pesky wants to take the above script, test it, and see if it works,
that would be good. It's definitely a much better pull than trying to
apply the patches forward..

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A couple of questions

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Imre Simon wrote:

 How will git handle a corrupted (git) file system?
 
 For instance, what can be done if objects/xy/z{38} does not pass the
 simple consistency test, i.e. if the file's sha1 hash is not xyz{38}?
 This might be a serious problem because, in general, one cannot
 reconstruct the contents of file objects/xy/z{38} from its name
 xyz{38}.

Nothing beats backups and distribution. The distributed nature of git 
means that you can replicate your objects abitrarily.

 Another problem might come up if the file does pass the simple
 consistency test but the file's contents is not a valid git file,

Run fsck-cache. It not only tests SHA1 and general object sanity, but it
does full tracking of the resulting reachability and everything else. It
prints out any corruption it finds (missing or bad objects), and if you
use the --unreachable flag it will also print out objects that exist but 
that aren't readable from any of the HEAD nodes (which you need to 
specify).

So for example

fsck-cache --unreachable $(cat .git/HEAD)

will do quite a _lot_ of verification on the tree. There are a few extra 
validity tests I'm going to add (make sure that tree objects are sorted 
properly etc), but on the whole if fsck-cache is happy, you do have a 
valid tree.

Any corrupt objects you will have to find in backups or other archives (ie
you can just remove them and do an rsync with some other site in the
hopes that somebody else has the object you have corrupted).

Of course, valid tree doesn't mean that it wasn't generated by some evil 
person, and the end result might be crap. Git is a revision tracking 
system, not a quality assurance system ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Greg KH wrote:

 On Sun, Apr 17, 2005 at 04:24:24PM -0700, Linus Torvalds wrote:
  
  Tools absolutely matter. And it will take time for us to build up that 
  kind of helper infrastructure. So being newbie might be part of it, but 
  it's the smaller part, I say. Rough interfaces is a big issue.
 
 Speaking of tools, you had a dotest program to apply patches in email
 form to a bk tree.  And from what I can gather, you've changed that to
 handle git archives, right?

Yup.

It's a git archive at 

kernel.org:/pub/linux/kernel/people/torvalds/git-tools.git

and it seems to work. It's what I've used for all the kernel patches 
(except for the merge), and it's what I use for the git stuff that shows 
up as authored by others.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Linus Torvalds wrote:
 
 No, that can't work. The pesky tools are helpful [...]
 I'm afraid that until Pasky's tools script this properly, [... ]
 If Pesky wants to take the above script, test it, [...]

Ok, one out of three isn't too bad, is it? Pesky/Pasky, so close yet so 
far. Sorry,

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [0/5] Parsers for git objects, porting some programs

2005-04-18 Thread Linus Torvalds


On Sun, 17 Apr 2005, Daniel Barkalow wrote:

 This series introduces common parsers for objects, and ports the programs
 that currently use revision.h to them.
 
  1: the header files
  2: the implementations
  3: port rev-tree
  4: port fsck-cache
  5: port merge-base

Ok, having now looked at the code, I don't have any objections at all. 
Could you clarify the fsck issue about reading the same object twice? 
When does that happen?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix bug in read-cache.c which loses files when merging a tree

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, James Bottomley wrote:
 
 I had a problem with the SCSI tree in that there's a file removal in one
 branch.  Your git-merge-one-file-script wouldn't have handled this
 correctly: It seems to think that the file must be removed in both
 branches, which is wrong.

Yes, I agree. My current merge-one-file-script doesn't actually look at 
what the original file was in this situation, and clearly it should. I 
think I'll leave it for the user to decide what happens when somebody has 
modified the deleted file, but clearly we should delete it if the other 
branch has not touched it.

I suspect that I should just pass in the SHA1 of the files to the
merge-one-file-script from merge-cache, rather than unpacking it.  
After all, the merging script can do the unpacking itself with a simple
cat-file blob $sha1.

And the fact is, many of the trivial merges should be handled by just
looking at the content, and doing a cmp on the files seems to be a
stupid way to do that when we had the sha1 earlier.

Done, and pushed out. Does the new merge infrastructure work for you?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Russell King wrote:
 
 Since this happened, I've been working out what state my tree is in,
 and I restored it back to a state where I had one dangling commit head,
 which was _my_ head.

For the future, if your tree gets messed up to the point where you say 
screw it and just want to go back in time, you can do this (it's 
equivalent to undo in BK speak):

git log | less -S

.. find which HEAD it was that you trusted..

In this case your HEAD before I merged with it was this one:

df4449813c900973841d0fa5a9e9bc7186956e1e

So to get back to that one, you can do

echo df4449813c900973841d0fa5a9e9bc7186956e1e  .git/HEAD

and now

cat-file commit $(cat .git/HEAD) | head -1

gives you

tree a43c4447b2edc9fb01a6369f10c1165de4494c88

so you can restore your checked-out state with

read-tree a43c4447b2edc9fb01a6369f10c1165de4494c88
checkout-cache -f -a
update-cache --refresh

and your tree should be valid again.

Now, to remove any bogus objects, you can then run my git-prune-script
(look at it carefully first to make sure you realize what you are doing).

NOTE NOTE NOTE! This will _revert_ everything you had done after the 
trusted point. So you may not actually want to do this. Instead:

 It's very much like I somehow committed against the _parent_ of the
 head, rather than the head itself.

That's very common if you just forget to update your new .git/HEAD when 
you do a commit.

Again, it's the tools that make it a bit too easy to mess up. The 
commit-tree thing is supposed to really only be used from scripts (which 
would do something like

result=$(commit-tree ...)  echo $result  .git/HEAD

but when doing things by hand, if you forget to update your HEAD, your 
next commit will be done against the wrong head, and you get dangling 
commits.

The good news is that this is not that hard to fix up. The _trees_ are all
correct, and the objects are all correct, so what you can do is just
generate a few new (proper) commit objects, with the right parents. Then
you can do the git-prune-script thing that will throw away the old
broken commits, since they won't be reachable from your new commits (even
though their _trees_ will be there and be the same).

So in this case:

b4a9a5114b3c6da131a832a8e2cd1941161eb348
+- e7905b2f22eb5d5308c9122b9c06c2d02473dd4f
   +- dc90c0db0dd5214aca5304fd17ccd741031e5493 -- extra dangling head
   +- 488faba31f59c5960aabbb2a5877a0f2923937a3

you can do

cat-file commit dc90c0db0dd5214aca5304fd17ccd741031e5493

to remind you what your old tree and commit message was, and then just 
re-commit that tree with the same message but with the proper parent:

commit-tree  -p 488faba31f59c5960aabbb2a5877a0f2923937a3

and then you need to do the same thing for the other commits (which will 
now need to be re-based to have the new commit-chain as their parents).

Then, when you fixed up the final one, remember to update .git/HEAD with 
its commit ID, and now the prune-thing will get rid of the old dangling 
commits that you just created new duplicates of.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix bug in read-cache.c which loses files when merging a tree

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Petr Baudis wrote:
 
 So, I'm confused. Why did you introduce unpack-file instead of doing
 just this?

It was code that I already had (ie the old code from merge-cache just
moved over), and thanks to that, I don't have to worry about broken
mktemp crap in user space...

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Greg KH wrote:
 
 Hm, have you pushed all of the recent changes public?

Oops. Obviously not. Will fix.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, Greg KH wrote:
 
 Anyway, I try it this way and get:

You should update to the newest version anyway..

 $ dotest ~/linux/patches/usb/usb-visor-tapwave_zodiac.patch 
 
 Applying USB: visor Tapwave Zodiac support patch
 
 fatal: preparing to update file 'drivers/usb/serial/visor.c' not uptodate in 
 cache
 
 What did I forget to do?

The most common reason is that the scripts _really_ want the index to 
match your current tree exactly. Run update-cache --refresh. And if you 
have any uncommitted information, make sure to commit it first.

(Not _strictly_ true - you can leave edited files in your directory, and 
just hope the patch never touches them. The thing you should _not_ do is 
to do an update-cache .c to commit any changes to the 'index', 
because then the patch applicator will actually commit that one too).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-done kernel archive - real one?

2005-04-18 Thread Linus Torvalds


On Tue, 19 Apr 2005, Petr Baudis wrote:
 
 What is actually a little annoying is having to cd ,,merge and then
 back, though. I don't know, but the current pull-merge script does not
 bother with the temporary merge directory neither, even though Linus
 wanted it. Linus, do you still do? ;-)

No, now that the merge is done entirely in the index file, I don't care 
any more. The index file _is_ the temporary directory as far as I'm 
concerned.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI trees, merges and git status

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, James Bottomley wrote:
 
 It looks like the merge tree has contamination from the scsi-misc-2.6
 tree ... possibly because the hosting system got the merged objects when
 I pushed.

Nope, the way I merge, if I get a few objects it shouldn't matter at all. 
I'll just look at your HEAD, and merge with the objects that represents.

Afterwards, if I have extra objects, I'll see them with fsck-cache. 

 Could you strip it back and I'll check out the repos on www.parisc-
 linux.org?

Git does work like BK in the way that you cannot remove history when you
have distributed it. Once it's there, it's there.

The patches from you I have in my tree are:

scsi: add DID_REQUEUE to the error handling
zfcp: add point-2-point support
[PATCH] Convert i2o to compat_ioctl
[PATCH] kill old EH constants
[PATCH] scsi: remove meaningless scsi_cmnd-serial_number_at_timeout 
field
[PATCH] scsi: remove unused scsi_cmnd-internal_timeout field
[PATCH] remove outdated print_* functions
[PATCH] consolidate timeout defintions in scsi.h

or at least that's what they claim in their changelogs.

Oh, and here's the diffstat that matches scsi:

 drivers/block/scsi_ioctl.c |5 -
 drivers/s390/scsi/zfcp_aux.c   |4 -
 drivers/s390/scsi/zfcp_def.h   |5 +
 drivers/s390/scsi/zfcp_erp.c   |   20 +
 drivers/s390/scsi/zfcp_fsf.c   |   38 --
 drivers/s390/scsi/zfcp_fsf.h   |6 +
 drivers/s390/scsi/zfcp_sysfs_adapter.c |6 +
 drivers/scsi/53c7xx.c  |   23 +++---
 drivers/scsi/BusLogic.c|7 -
 drivers/scsi/NCR5380.c |9 +-
 drivers/scsi/advansys.c|7 -
 drivers/scsi/aha152x.c |   17 ++--
 drivers/scsi/arm/acornscsi.c   |9 +-
 drivers/scsi/arm/fas216.c  |9 +-
 drivers/scsi/arm/scsi.h|2 
 drivers/scsi/atari_NCR5380.c   |9 +-
 drivers/scsi/constants.c   |2 
 drivers/scsi/ips.c |7 -
 drivers/scsi/ncr53c8xx.c   |   14 ---
 drivers/scsi/pci2000.c |4 -
 drivers/scsi/qla2xxx/qla_dbg.c |6 -
 drivers/scsi/scsi.c|5 -
 drivers/scsi/scsi.h|   43 ---
 drivers/scsi/scsi_error.c  |   11 ---
 drivers/scsi/scsi_ioctl.c  |5 -
 drivers/scsi/scsi_lib.c|2 
 drivers/scsi/scsi_obsolete.h   |  106 -
 drivers/scsi/scsi_priv.h   |5 -
 drivers/scsi/seagate.c |5 -
 drivers/scsi/sg.c  |3 
 drivers/scsi/sun3_NCR5380.c|9 +-
 drivers/scsi/sym53c8xx_2/sym_glue.c|6 -
 drivers/scsi/ultrastor.c   |4 -

so it doesn't look like there's a _lot_ wrong. Send in a patch to revert 
anything that needs reverting..

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI trees, merges and git status

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, James Bottomley wrote:
 
 Then the git-pull... script actually does the merge and the resulting
 tree checks out against BK

So?

What do you intend to do with all the other stuff I've already put on top?

Yes, I can undo my tree, but my tree has had more stuff in it since I 
pulled from you, so not only will that confuse everybody who already got 
the up-to-date tree, it will also undo stuff that was correct.

In other words, HISTORY CANNOT BE UNDONE.

That's the rule, and it's a damn good one. It was the rule when we used
BK, and it's the rule now. The fact that you can undo your history in
_your_ tree doesn't change anything at all.

So I can merge with your new tree, but that won't actually help any: I'll 
just get a superset, the way you did things. 

The way to remove patches is to explicitly revert them (effectively
applying a reverse diff), but I'm wondering if it's worth it in this case. 
I looked at the patches I did get, and they didn't look horribly bad per 
se. Are they dangerous?

2.6.12 is some time away, if for no other reason than the fact that this 
SCM thing has obviously eaten two weeks of my time. So I'd be inclined to 
chalk this up as a learning experience with git, and just go forward.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI trees, merges and git status

2005-04-18 Thread Linus Torvalds


On Mon, 18 Apr 2005, James Bottomley wrote:
 
 Fair enough.  If you pull from
 
 rsync://www.parisc-linux.org/~jejb/scsi-misc-2.6.git

Thanks. Pulled and pushed out.

 Doing this exposed two bugs in your merge script:
 
 1) It doesn't like a completely new directory (the misc tree contains a
 new drivers/scsi/lpfc)
 2) the merge testing logic is wrong.  You only want to exit 1 if the
 merge fails.

Applied.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [darcs-devel] Darcs and git: plan of action

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Tupshin Harper wrote:
 
 I suspect that any use of wildcards in a new format would be impossible 
 for darcs since it wouldn't allow darcs to construct dependencies, 
 though I'll leave it to david to respond to that.

Note that git _does_ very efficiently (and I mean _very_) expose the 
changed files.

So if this kind of darcs patch is always the same pattern just repeated
over n files, then you really don't need to even list the files at all.  
Git gives you a very efficient file listing by just doing a diff-tree  
(which does not diff the _contents_ - it really just gives you a pretty
much zero-cost which files changed listing).

So that combination would be 100% reliable _if_ you always split up darcs 
patches to common elements. 

And note that there does not have to be a 1:1 relationship between a git
commit and a darcs patch. For example, say that you have a darcs patch
that does a combination of change token x to token y in 100 files and
rename file a into b. I don't know if you do those kind of combination 
patches at all, but if you do, why not just split them up into two? That 
way the list of files changed _does_ 100% determine the list of files for 
the token exchange.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: naive question

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Petr Baudis wrote:
 
 I'd actually prefer, if:
 
 (i) checkout-cache simply wouldn't touch files whose stat matches with
 what is in the cache; it updates the cache with the stat informations
 of touched files

Run update-cache --refresh _before_ doing the checkout-cache, and that 
is exactly what will happen.

But yes, if you want to make checkout-cache update the stat info (Ingo 
wanted to do that too), it should be possible. The end result is a 
combination of update-cache and checkout-cache, though: you'll 
effectively need to both (just in one pass).

With the current setup, you have to do

update-cache --refresh
checkout-cache -f -a
update-cache --refresh

which is admittedly fairly inefficient.

The real expense right now of a merge is that we always forget all the
stat information when we do a merge (since it does a read-tree). I have a
cunning way to fix that, though, which is to make read-tree -m read in
the old index state like it used to, and then at the end just throw it
away except for the stat information.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: naive question

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Linus Torvalds wrote:
 
 The real expense right now of a merge is that we always forget all the
 stat information when we do a merge (since it does a read-tree). I have a
 cunning way to fix that, though, which is to make read-tree -m read in
 the old index state like it used to, and then at the end just throw it
 away except for the stat information.

Ok, done. That was really the plan all along, it just got dropped in the 
excitement of trying to get the dang thing to _work_ in the first place ;)

The current version only does

read-tree -m orig branch1 branch2

which now reads the old stat cache information, and then applies that to 
the end result of any trivial merges in case the merge result matches the 
old file stats. It really boils down to this littel gem;

/*
 * See if we can re-use the old CE directly?
 * That way we get the uptodate stat info.
 */
if (path_matches(result, old)  same(result, old))
*result = *old;


and it seems to work fine.

HOWEVER, I'll also make it do the same for a single-tree merge:

read-tree -m newtree

so that you can basically say read a new tree, and merge the stat 
information from the current cache.  That means that if you do a
read-tree -m newtree followed by a checkout-cache -f -a, the 
checkout-cache only checks out the stuff that really changed.

You'll still need to do an update-cache --refresh for the actual new
stuff. We could make checkout-cache update the cache too, but I really
do prefer a checkout-cache only reads the index, never changes it  
world-view. It's nice to be able to have a read-only git tree.

Final note: just doing a plain read-tree newtree will still throw all
the stat info away, and you'll have to refresh it all...

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Chris Mason wrote:
 
 Very true, you can't replace quilt with git without ruining both of them.  
 But 
 it would be nice to take a quilt tree and turn it into a git tree for merging 
 purposes, or to make use of whatever visualization tools might exist someday. 
  

Fair enough. The thing is, going from quilt-git really is a pretty big
decision, since it's the decision that says I will now really commit all
this quilt changes forever and ever.

Which is also why I think it's actually ok to take a minute to do 100
quilt patches. This is not something you do on a whim. It's something
you'd better think about. It's turning a very fluid environment into a
unchangable, final thing.

That said, I agree that write-tree is expensive. It tends to be by far
the most expensive op you normally do. I'll make sure it goes faster.

 We already have a trust me, it hasn't changed via update-cache.

Heh. I see update-cache not as a it hasn't changed, but a it _has_ 
changed, and now I want you to reflect that fact. In other words, 
update-cache is an active statement: it says that you're ready to commit 
your changes.

In contrast, to me your write-tree thing in many ways is the reverse of 
that: it's saying don't look here, there's nothing interesting there.

Which to me smells like trying to hide problems rather than being positive 
about them.

Which it is, of course. It's trying to hide the fact that writing a tree 
is not instantaenous.

 With that said, I hate the patch too.  I didn't see how to compare against 
 the 
 old tree without reading each tree object from the old tree, and that should 
 be slower then what write-tree does now.

Reading a tree is faster, simply because you uncompress instead of
compress. So I can read a tree in 0.28 seconds, but it takes me 0.34
seconds to write one. That said, reading the trees has disk seek issues if
it's not in the cache.

What I'd actually prefer to do is to just handle tree caching the same way
we handle file caching - in the index.

Ie we could have the index file track what subtree is this directory
associated with, and have a update-cache --refresh-dir thing that
updates it (and any entry update in that directory obviously removes the
dir-cache entry).

Normally we'd not bother and it would never trigger, but it would be
useful for your scripted setup it would end up caching all the tree
information in a very efficient manner. Totally transparently, apart from
the one --refresh-dir at the beginning. That one would be slightly
expensive (ie would do all the stuff that write-tree does, but it would
be done just once).

(We could also just make write-tree do it _totally_ transparently, but
then we're back to having write-tree both read _and_ write the index file,
which is a situation that I've been trying to avoid. It's so much easier 
to verify the correctness of an operation if it is purely one-way).

I'll think about it. I'd love to speed up write-tree, and keeping track of 
it in the index is a nice little trick, but it's not quite high enough up 
on my worries for me to act on it right now.

But if you want to try to see how nasty it would be to add tree index
entries to the index file at write-tree time automatically, hey...

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Greg KH wrote:
 
 Nice, it looks like the merge of this tree, and my usb tree worked just
 fine.

Yup, it all seems to work out.

 So, what does this now mean?  Is your kernel.org git tree now going to
 be the real kernel tree that you will be working off of now?  Should
 we crank up the nightly snapshots and emails to the -commits list?

I'm not quite ready to consider it real, but I'm getting there.

I'm still working out some performance issues with merges (the actual
merge operation itself is very fast, but I've been trying to make the
subsequent update the working directory tree to the right thing be much
better).

 Can I rely on the fact that these patches are now in your tree and I can
 forget about them? :)
 
 Just wondering how comfortable you feel with your git tree so far.

Hold off for one more day. I'm very comfortable with how well git has 
worked out so far, and yes, mentally I consider this the tree, but the 
fact is, git isn't exactly easy on normal users.

I think my merge stuff and Pasky's scripts are getting there, but I want
to make sure that we have a version of Pasky's scripts that use the new
read-tree -m optimizations to make tracking a tree faster, and I'd like
to have them _tested_ a bit first.

In other words, I want it to be at the point where people can do

git pull repo-address

and it will just work, at least for people who don't have any local
changes in their tree. None of this check out all the files again crap.

But how about a plan that we go live tomorrow - assuming nobody finds
any problems before that, of course.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Steven Cole wrote:

 But perhaps a progress bar right about here might be
 a good thing for the terminally impatient.
 
 real3m54.909s
 user0m14.835s
 sys 0m10.587s
 
 4 minutes might be long enough to cause some folks to lose hope.

Well, the real operations took only 15 seconds. What kind of horribe 
person are you, that you don't have all of the kernel in your disk cache 
already? Shame on you.

Or was the 4 minutes for downloading all the objest too?

Anyway, it looks like you are using pasky's scripts, and the old 
patch-based upgrade at that. You certainly will _not_ see the

[many files patched]
patching file mm/mmap.c
..

if you use a real git merge. That's probable be the real problem here.

Real merges have no patches taking place _anywhere_. And they take about 
half a second. Doing an update of your tree should _literally_ boil down 
to

#
# repo needs to point to the repo we update from
#
rsync -avz --ignore-existing $repo/objects/. .git/objects/.
rsync -L $repo/HEAD .git/NEW_HEAD || exit 1
read-tree -m $(cat .git/NEW_HEAD) || exit 1
checkout-cache -f -a
update-cache --refresh
mv .git/NEW_HEAD .git/HEAD

and if it does anything else, it's literally broken. Btw, the above does
need my read-tree -m thing which I committed today.

(CAREFUL: the above is not a good script, because it _will_ just overwrite 
all your old contents with the stuff you updated to. You should thus not 
actually use something like this, but a git update should literally end 
up doing the above operations in the end, and just add proper checking).

And if that takes 4 minutes, you've got problems.

Just say no to patches. 

Linus

PS: If you want a clean tree without any old files or anything else, for
that matter, you can then do a show-files -z --others | xargs -0 rm, but
be careful: that will blow away _anything_ that wasn't revision controlled
with git. So don't blame me if your pr0n collection is gone afterwards.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Chris Mason wrote:
 
 5) right before exiting, write-tree updates the index if it made any changes.

This part won't work. It needs to do the proper locking, which means that 
it needs to create index.lock _before_ it reads the index file, and 
write everything to that one and then do a rename.

If it doesn't need to do the write, it can just remove index.lock without 
writing to it, obviously.

 The downside to this setup is that I've got to change other index users to 
 deal with directory entries that are there sometimes and missing other times. 
  
 The nice part is that I don't have to invalidate the directory entry, if it 
 is present, it is valid.

To me, the biggest downside is actually the complexity part, and worrying
about the directory index ever getting stale. How big do the changes end
up being?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Junio C Hamano wrote:
 
 Let's for a moment forget what git-pasky currently does, which
 is not to touch .git/index until the user says Ok, let's
 commit. 

I think git-pasky is wrong.

It's true that we want to often (almost always) diff against the last 
released thing, and I actually think git-pasky does what it does because 
I never wrote a tool to diff the current working directory against a 
tree.

At the same time, I very much worked with a model where you do _not_ have 
a traditional work file, but the index really _is_ the work file.

 I'd like to start from a different premise and see what happens:
 
  - What .git/index records is *not* the state as the last
commit.  It is just an cache Cogito uses to speed up access
to the user's working tree.  From the user's point of view,
it does not even exist.

Yes. Yes. YES.

That is indeed the whole point of the index file. In my world-view, the
index file does _everything_. It's the staging area (work file), it's
the merging area (merge directory) and it's the cache file (stat
cache).

I'll immediately write a tool to diff the current working directory 
against a tree object, and hopefully that will just make pasky happy with 
this model too. 

Is there any other reason why git-pasky wants to have a work file?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Linus Torvalds wrote:
 
 That is indeed the whole point of the index file. In my world-view, the
 index file does _everything_. It's the staging area (work file), it's
 the merging area (merge directory) and it's the cache file (stat
 cache).
 
 I'll immediately write a tool to diff the current working directory 
 against a tree object, and hopefully that will just make pasky happy with 
 this model too. 

Ok, immediately took a bit longer than I wanted to, and quite frankly,
the end result is not very well tested. It was a bit more complex than I
was hoping for to match up the index file against a tree object, since
unlike the tree-tree comparison in diff-tree, you have to compare two
cases where the layout isn't the same.

No matter. It seems to work to a first approximation, and the result is
such a cool tool that it's worth committing and pushing out immediately. 

The code ain't exactly pretty, but hey, maybe that's just me having higher 
standards of beauty than most. Or maybe you just shudder at what I 
consider pretty in the first place, in which case you probably shouldn't 
look too closely at this one.

What the new diff-cache does is basically emulate diff-tree, except 
one of the trees is always the index file.

You can also choose whether you want to trust the index file entirely
(using the --cached flag) or ask the diff logic to show any files that
don't match the stat state as being tentatively changed.  Both of these
operations are very useful indeed.

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

diff-cache --cached $(cat .git/HEAD)

(another difference between diff-tree and diff-cache is that the new 
diff-cache can take a commit object, and it automatically just extracts 
the tree information from there).

Example: let's say I had renamed commit.c to git-commit.c, and I had 
done an upate-cache to make that effective in the index file. 
show-diff wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:

[EMAIL PROTECTED]:~/git diff-cache --cached $(cat .git/HEAD)
-100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74commit.c
+100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74
git-commit.c

So what the above diff-cache command line does is to say

   show me the differences between HEAD and the current index contents 
(the ones I'd write with a write-tree)

And as you can see, the output matches diff-tree -r output (we always do
-r, since the index is always fully populated). All the same rules: +  
means added file, - means removed file, and * means changed file. You 
can trivially see that the above is a rename.

In fact, diff-tree --cached _should_ always be entirely equivalent to
actually doing a write-tree and comparing that. Except this one is much
nicer for the case where you just want to check. Maybe you don't want to
do the tree.

So doing a diff-cache --cached is basically very useful when you are 
asking yourself what have I already marked for being committed, and 
what's the difference to a previous tree.

However, the non-cached version takes a different approach, and is
potentially the even more useful of the two in that what it does can't be
emulated with a write-tree + diff-tree. Thus that's the default mode.  
The non-cached version asks the question

   show me the differences between HEAD and the currently checked out 
tree - index contents _and_ files that aren't up-to-date

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the diff-tree -r output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic all-zero sha1 to show
that. So let's say that you have edited kernel/sched.c, but have not
actually done an update-cache on it yet - there is no object associated
with the new state, and you get:

[EMAIL PROTECTED]:~/v2.6/linux diff-cache $(cat .git/HEAD )
*100644-100664 blob
7476bbcfe5ef5a1dd87d745f298b831143e4d77e-
  kernel/sched.c

ie it shows that the tree has changed, and that kernel/sched.c has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, diff-cache does not actually 
look at the contents of the file at all. So maybe kernel/sched.c hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you need to upate-cache it to make the cache be in sync.

NOTE 2! You can have a mixture

Re: [PATCH 2/3] init-db.c: normalize env var handling.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Zach Welch wrote:

 This patch applies on top of:
 [PATCH 1/3] init-db.c: cleanup comments
 
  init-db.c |   11 +++
  1 files changed, 3 insertions(+), 8 deletions(-)
 
 Signed-Off-By: Zach Welch [EMAIL PROTECTED]
 
 Normalize init-db environment variable handling, allowing the creation
 of object directories with something other than DEFAULT_DB_ENVIRONMENT.
 
 --- a/init-db.c
 +++ b/init-db.c

For future reference, this is in the wrong order.

You should have checkin comment first, then signed-off-by, then a line 
with three dashes, and then administrative trivia.

Ie I'd much rather see the email look like

Normalize init-db environment variable handling, allowing the creation
of object directories with something other than DEFAULT_DB_ENVIRONMENT.

Signed-Off-By: Zach Welch [EMAIL PROTECTED]
---
This patch applies on top of:
[PATCH 1/3] init-db.c: cleanup comments

 init-db.c |   11 +++
 1 files changed, 3 insertions(+), 8 deletions(-)

.. actual patch goes here ..

since otherwise I'll just have to edit it that way. I like seeing the 
administrative stuff (diffstat etc), but I don't want to have it in the 
commit message, and that's exactly what the --- marker is for - my tools 
will automatically cut it off as if it was a signature (or the beginning 
of the patch).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] init-db.c: normalize env var handling.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Zach Welch wrote:
 
 I feel even more abashed for my earlier scripting faux pas. Would you
 like me to resend them to you off-list?

No, I edited them and applied them (the first series, I'll have to think 
about the second one).

It's only when there are tens of patches that it gets really old really 
quickly to edit things by hand. Three I can handle ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Tue, 19 Apr 2005, Chris Mason wrote:
 
 I'll finish off the patch once you ok the basics below.  My current code 
 works 
 like this:

Chris, before you do anything further, let me re-consider.

Assuming that the real cost of write-tree is the compression (and I think
it is), I really suspect that this ends up being the death-knell to my
use the sha1 of the _compressed_ object approach. I thought it was
clever, and I was ready to ignore the other arguments against it, but if
it turns out that we can speed up write-tree a lot by just doing the SHA1
on the uncompressed data, and noticing that we already have the tree
before we need to compress it and write it out, then that may be a good
enough reason for me to just admit that I was wrong about that decision.

So I'll see if I can turn the current fsck into a convert into
uncompressed format, and do a nice clean format conversion. 

Most of git is very format-agnostic, so that shouldn't be that painful. 
Knock wood.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Jon Seymour wrote:
 
 Am I correct to understand that with this change, all the objects in the 
 database are still being compressed (so no net performance benefit), but by 
 doing the SHA1 calculations before compression you are keeping open the 
 possibility that at some point in the future you may use a different 
 compression technique (including none at all) for some or all of the 
 objects?

Correct. There is zero performance benefit to this right now, and the only 
reason for doing it is because it will allow other things to happen.

Note that the other things include:
 - change the compression format to make it cheaper
 - _keep_ the same compression format, but notice that we already have an 
   object by looking at the uncompressed one.

I'm actually leaning towards just #2 at this time. I like how things
compress, and it sure is simple. The fact that we use the equivalent of
-9 may be expensive, but the thing is, we don't actually write new files
that often, and it's just CPU time (no seeking on disk or anything like
that), which tends to get cheaper over time.

So I suspect that once I optimize the tree writing to notice that oh, I
already have this tree object, and thus build it up but never compressing
it, write-tree performance will go up _hugely_ even without removing the
compressioin. Because most of the time, write-tree actually only needs to
create a couple of small new tree objects.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Accept commit in some places when tree is needed.

2005-04-20 Thread Linus Torvalds


On Tue, 19 Apr 2005, Junio C Hamano wrote:
 
 This patch lifts the tree-from-tree-or-commit logic from
 diff-cache.c and moves it to sha1_file.c, which is a common
 library source for the SHA1 storage part.

I don't think that's a good interface. It changes the sha1 passed into it: 
that may actually be nice, since you may want to know what it changed to, 
but I think you'd want to have that as an (optional) separate 
sha1_result parameter. 

Also, the type or size things make no sense to have as a parameter 
at all.

IOW, it was fine when it was an internal hacky thing in diff-cache, but 
once it's promoted to be a real library function it should definitely be 
cleaned up to have sane interfaces that make sense in general, and not 
just within the original context.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, C. Scott Ananian wrote:
 
 Hmm.  Are our index files too large, or is there some other factor?

They _are_ pretty large, but they have to be,

For the kernel, the index file is about 1.6MB. That's 

 - 17,000+ files and filenames
 - stat information for all of them
 - the sha1 for them all

ie for the kernel it averages to 93.5 bytes per file. Which is actually 
pretty dense (just the sha1 and stat information is about half of it, and 
those are required).

 I was considering using a chunked representation for *all* files (not just 
 blobs), which would avoid the original 'trees must reference other trees 
 or they become too large' issue -- and maybe the performance issue you're 
 referring to, as well?

No. The most common index file operation is reading, and that's the one 
that has to be _fast_. And it is - it's a single mmap and some parsing.

In fact, writing it is pretty fast too, exactly because the index file is 
totally linear and isn't compressed or anything fancy like that. It's a 
_lot_ faster than the tree objects, exactly because it doesn't need to 
be as careful.

The main cost of the index file is probably the fact that I add a sha1 
signature of the file into itself to verify that it's ok. The advantage is 
that the signature means that the file is ok, and the parsing of it can be 
much more relaxed. You win some, you lose some.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, C. Scott Ananian wrote:
 
 OK, sure.  But how 'bout chunking trees?  Are you grown happy with the new 
 trees-reference-other-trees paradigm, or is there a deep longing in your 
 heart for the simplicity of 'trees-reference-blobs-period'?

I'm pretty sure we do better chunking on a subdirectory basis, especially 
as it allows us to do various optimizations (avoid diffing common parts).

Yes, you could try to do the same optimizations with chunking, but then 
you'd need to make sure that the chunking was always on a full tree entry 
boundary etc - ie much harder than blob chunking. 

But hey, numbers talk, bullshit walks. 

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Linus Torvalds wrote:
 
 To actually go faster, it _should_ need this patch. Untested. See if it 
 works..

NO! Don't see if this works. For the sha1 file already exists file, it 
forgot to return the SHA1 value in returnsha1, and would thus corrupt 
the trees it wrote.

So don't apply, don't test. You won't corrupt your archive (you'll just
write bogus tree objects), but if you commit the bogus trees you're going
to be in a world of hurt and will have to undo everything you did.

It's a good test for fsck though. It core-dumps because it tries to add 
references to NULL objects.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds


On Wed, 20 Apr 2005, Linus Torvalds wrote:
 
 NO! Don't see if this works. For the sha1 file already exists file, it 
 forgot to return the SHA1 value in returnsha1, and would thus corrupt 
 the trees it wrote.

Proper version with fixes checked in. For me, it brings down the time to
write a kernel tree from 0.34s to 0.24s, so a third of the time was just
compressing objects that we ended up already having.

Two thirds to go ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   >