Re: [RFC/PATCH] connect: add GIT_SSH_{SEND,RECEIVE}{,_COMMAND} env variables

2018-01-03 Thread Jeff King
On Thu, Jan 04, 2018 at 01:08:28AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Hopefully this is clearer, and depending on how the rest of the
> discussion goes I'll submit v2 with something like this in the commit
> message:
> 
> SSH keys A and B are known to the remote service, and used to identify
> two different users.
> 
> A can only push to repository X, and B can only fetch from repository Y.
> 
> Thus, if you have a script that does:
> 
> GIT_SSH_COMMAND="ssh -i A -i B" git ...
> 
> It'll always fail for pulling from X, and pushing to Y. Supply:
> 
> GIT_SSH_COMMAND="ssh -i B -i A" git ...
> 
> And now pulling will work, but pushing won't.

I get that you may have two different keys to go with two different
identities on a remote system. But I'm not sure I understand why
"sending" or "receiving" is the right way to split those up. Wouldn't
you also sometimes want to fetch from repository X? IOW, wouldn't you
want to tie identity "A" to repository "X", and "B" to repository "Y?

> So now I just have a GIT_SSH_COMMAND that dispatches to different keys
> depending on the operation, as noted in the commit message, and I can
> assure you that without that logic it doesn't work.

You mentioned host aliases later, which is the solution I've seen in the
wild. And then you can map each remote to a different host alias.

-Peff


Re: Bug report: git clone with dest

2018-01-03 Thread Jeff King
On Wed, Jan 03, 2018 at 02:42:51PM -0800, Isaac Shabtay wrote:

> Indeed interesting... this one's for the books...
> Thanks for the patches. Any idea when these are going to make it to the
> official Git client builds? (specifically the Windows one)

They haven't even been reviewed yet. If they get good feedback, then the
maintainer will pick them up, then merge them to 'next', and then
eventually to 'master', after which they'd become part of the next
major release. For a pure bug-fix, it may instead go to 'maint' and
become part of the next minor release.

Right now we're entering release freeze for v2.16.0. We'd still take
fixes for recent breakages there, but given the age of the problem I
doubt it will make the cutoff. But as this is a bug-fix, it might make
it into v2.16.1.

-Peff


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread suzuki toshiya

Hi,


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).


Taking a glance on parse-options.h, I could not find the
existing class collecting the operands as an array (or
linked list) from multiple "--xxx=yyy" options. Similar
things might be the collecting the pathnames to pathspec
structure. Should I write something with OPTION_CALLBACK?

Regards,
mpsuzuki

suzuki toshiya wrote:

Dear René ,

By overlooking your response, I was writing a patch to add
uid/gid into zip archive X-D (not finished yet)
https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
However, I found that most unix platforms use infozip's
extension to store uid/gid instead of pkzip's extension...


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:


Oh, I should check other software's tarball :-)


The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.


Hmm, I was not aware of such security concern about the
tarball including the developers username.


So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).

I want to hear other experts' comment; no need for me to
work "--uid" "--gid" anymore, and should I switch to
"--append-file" options?

Regards,
mpsuzuki

René Scharfe wrote:

Am 02.01.2018 um 07:58 schrieb suzuki toshiya:

Dear René ,

René Scharfe wrote:

Am 29.12.2017 um 15:05 schrieb suzuki toshiya:

The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
  ./git-archive --format=tar \
  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
  @mkdir -p $(GIT_TARNAME)
  @cp configure $(GIT_TARNAME)
  @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
  @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
  $(TAR) rf $(GIT_TARNAME).tar \
  $(GIT_TARNAME)/configure \
  $(GIT_TARNAME)/version \
  $(GIT_TARNAME)/git-gui/version
  @$(RM) -r $(GIT_TARNAME)
  gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more 

Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread suzuki toshiya

Dear René ,

By overlooking your response, I was writing a patch to add
uid/gid into zip archive X-D (not finished yet)
https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
However, I found that most unix platforms use infozip's
extension to store uid/gid instead of pkzip's extension...


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:


Oh, I should check other software's tarball :-)


The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.


Hmm, I was not aware of such security concern about the
tarball including the developers username.


So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).

I want to hear other experts' comment; no need for me to
work "--uid" "--gid" anymore, and should I switch to
"--append-file" options?

Regards,
mpsuzuki

René Scharfe wrote:

Am 02.01.2018 um 07:58 schrieb suzuki toshiya:

Dear René ,

René Scharfe wrote:

Am 29.12.2017 um 15:05 schrieb suzuki toshiya:

The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
  ./git-archive --format=tar \
  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
  @mkdir -p $(GIT_TARNAME)
  @cp configure $(GIT_TARNAME)
  @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
  @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
  $(TAR) rf $(GIT_TARNAME).tar \
  $(GIT_TARNAME)/configure \
  $(GIT_TARNAME)/version \
  $(GIT_TARNAME)/git-gui/version
  @$(RM) -r $(GIT_TARNAME)
  gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.

Changing the build procedure to set owner and group to root as well as
UID and GID to zero seems like a better idea.  This is complicated by
the inconsistent command line options for GNU tar and bsdtar, as you
mentioned.

So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.

Not sure what kind of file name transformation abilities would be
needed and how to package them nicely.  The --transform option of GNU
tar with its sed replace expressions seems quite heavy for me.  With
poppler it's only used to add the --prefix string; I'd expect that to
be done for all 

Re: [PATCH 20/26] fetch-pack: perform a fetch using v2

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> When communicating with a v2 server, perform a fetch by requesting the
> 'fetch' command.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/fetch-pack.c   |   2 +-
>  fetch-pack.c   | 267 
> -
>  fetch-pack.h   |   4 +-
>  t/t5701-protocol-v2.sh |  40 
>  transport.c|   8 +-
>  5 files changed, 314 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
> index f492e8abd..867dd3cc7 100644
> --- a/builtin/fetch-pack.c
> +++ b/builtin/fetch-pack.c
> @@ -213,7 +213,7 @@ int cmd_fetch_pack(int argc, const char **argv, const 
> char *prefix)
> }
>
> ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
> -, pack_lockfile_ptr);
> +, pack_lockfile_ptr, protocol_v0);
> if (pack_lockfile) {
> printf("lock %s\n", pack_lockfile);
> fflush(stdout);
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 9f6b07ad9..c26fdc539 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1008,6 +1008,262 @@ static struct ref *do_fetch_pack(struct 
> fetch_pack_args *args,
> return ref;
>  }
>
> +static void add_wants(const struct ref *wants, struct strbuf *req_buf)
> +{
> +   for ( ; wants ; wants = wants->next) {
> +   const struct object_id *remote = >old_oid;
> +   const char *remote_hex;
> +   struct object *o;
> +
> +   /*
> +* If that object is complete (i.e. it is an ancestor of a
> +* local ref), we tell them we have it but do not have to
> +* tell them about its ancestors, which they already know
> +* about.
> +*
> +* We use lookup_object here because we are only
> +* interested in the case we *know* the object is
> +* reachable and we have already scanned it.
> +*/
> +   if (((o = lookup_object(remote->hash)) != NULL) &&
> +   (o->flags & COMPLETE)) {
> +   continue;
> +   }
> +
> +   remote_hex = oid_to_hex(remote);
> +   packet_buf_write(req_buf, "want %s\n", remote_hex);
> +   }
> +}
> +
> +static int add_haves(struct strbuf *req_buf, int *in_vain)
> +{
> +   int ret = 0;
> +   int haves_added = 0;
> +   const struct object_id *oid;
> +
> +   while ((oid = get_rev())) {
> +   packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
> +   if (++haves_added >= INITIAL_FLUSH)
> +   break;
> +   };
> +
> +   *in_vain += haves_added;
> +   if (!haves_added || *in_vain >= MAX_IN_VAIN) {
> +   /* Send Done */
> +   packet_buf_write(req_buf, "done\n");
> +   ret = 1;
> +   }
> +
> +   return ret;
> +}
> +
> +static int send_haves(int fd_out, int *in_vain)
> +{
> +   int ret = 0;
> +   struct strbuf req_buf = STRBUF_INIT;
> +
> +   ret = add_haves(_buf, in_vain);
> +
> +   /* Send request */
> +   packet_buf_flush(_buf);
> +   write_or_die(fd_out, req_buf.buf, req_buf.len);
> +
> +   strbuf_release(_buf);
> +   return ret;
> +}
> +
> +static int send_fetch_request(int fd_out, const struct fetch_pack_args *args,
> + const struct ref *wants, struct oidset *common,
> + int *in_vain)
> +{
> +   int ret = 0;
> +   struct strbuf req_buf = STRBUF_INIT;
> +
> +   packet_buf_write(_buf, "command=fetch");
> +   packet_buf_write(_buf, "agent=%s", git_user_agent_sanitized());
> +   if (args->stateless_rpc)
> +   packet_buf_write(_buf, "stateless-rpc=true");
> +
> +   packet_buf_delim(_buf);
> +   if (args->use_thin_pack)
> +   packet_buf_write(_buf, "thin-pack");
> +   if (args->no_progress)
> +   packet_buf_write(_buf, "no-progress");
> +   if (args->include_tag)
> +   packet_buf_write(_buf, "include-tag");
> +   if (prefer_ofs_delta)
> +   packet_buf_write(_buf, "ofs-delta");
> +
> +   /* add wants */
> +   add_wants(wants, _buf);
> +
> +   /*
> +* If we are running stateless-rpc we need to add all the common
> +* commits we've found in previous rounds
> +*/
> +   if (args->stateless_rpc) {
> +   struct oidset_iter iter;
> +   const struct object_id *oid;
> +   oidset_iter_init(common, );
> +
> +   while ((oid = oidset_iter_next())) {
> +   packet_buf_write(_buf, "have %s\n", 
> oid_to_hex(oid));
> +   }
> +   }
> +
> +   /* Add initial haves */
> +   ret = add_haves(_buf, in_vain);
> +
> +   /* 

Re: [PATCH 19/26] upload-pack: introduce fetch server command

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> Introduce the 'fetch' server command.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/technical/protocol-v2.txt |  14 ++
>  serve.c |   2 +
>  upload-pack.c   | 290 
> 
>  upload-pack.h   |   9 +
>  4 files changed, 315 insertions(+)
>  create mode 100644 upload-pack.h
>
> diff --git a/Documentation/technical/protocol-v2.txt 
> b/Documentation/technical/protocol-v2.txt
> index 5f4d0e719..2a8e2f226 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -115,3 +115,17 @@ The output of ls-refs is as follows:
>
>  symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
>  shallow = PKT-LINE("shallow" SP obj-id LF)
> +
> + Fetch
> +---
> +
> +Fetch will need to be a modified version of the v1 fetch protocol.  Some
> +potential areas for improvement are: Ref-in-want, CDN offloading,
> +Fetch-options.
> +
> +Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> +to perform a ref-advertisement, instead a client can run the 'ls-refs'
> +service first, in order to find out what refs the server has, and then
> +request those refs directly using the fetch service.
> +
> +//TODO Flesh out the design

TODO: actually do it. ;)

a couple notes from the discussion in office:
* Could we split fetch into multiple phases
  (negotiation + getting the pack)
* negotiation could be reused in forced push to
  minimize the pack to be sent
* negotiation in a half duplex is actually better
  called 'discovery', which discovers about the set
  of objects available on the remote side.
  (the opposite would be reveal, or 'ask-for-discovery', which
  is could be used for a symmetric design of fetch and push)


> diff --git a/serve.c b/serve.c
> index 88d548410..ca3bb7190 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -6,6 +6,7 @@
>  #include "argv-array.h"
>  #include "ls-refs.h"
>  #include "serve.h"
> +#include "upload-pack.h"
>
>  static int always_advertise(struct repository *r,
> struct strbuf *value)
> @@ -46,6 +47,7 @@ static struct protocol_capability capabilities[] = {
> { "agent", agent_advertise, NULL },
> { "stateless-rpc", always_advertise, NULL },
> { "ls-refs", always_advertise, ls_refs },
> +   { "fetch", always_advertise, upload_pack_v2 },
>  };
>
>  static void advertise_capabilities(void)
> diff --git a/upload-pack.c b/upload-pack.c
> index 2ca60d27c..c41f6f528 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -20,6 +20,7 @@
>  #include "prio-queue.h"
>  #include "protocol.h"
>  #include "serve.h"
> +#include "upload-pack.h"
>
>  static const char * const upload_pack_usage[] = {
> N_("git upload-pack [] "),
> @@ -1040,6 +1041,295 @@ static void upload_pack(void)
> }
>  }
>
> +struct upload_pack_data {
> +   struct object_array wants;
> +   struct oid_array haves;
> +
> +   unsigned stateless_rpc : 1;
> +
> +   unsigned use_thin_pack : 1;
> +   unsigned use_ofs_delta : 1;
> +   unsigned no_progress : 1;
> +   unsigned use_include_tag : 1;
> +   unsigned done : 1;
> +};
> +
> +#define UPLOAD_PACK_DATA_INIT { OBJECT_ARRAY_INIT, OID_ARRAY_INIT, 0, 0, 0, 
> 0, 0, 0 }
> +
> +static void upload_pack_data_clear(struct upload_pack_data *data)
> +{
> +   object_array_clear(>wants);
> +   oid_array_clear(>haves);
> +}
> +
> +static int parse_want(const char *line)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "want ", )) {
> +   struct object_id oid;
> +   struct object *o;
> +
> +   if (get_oid_hex(arg, ))
> +   die("git upload-pack: protocol error, "
> +   "expected to get oid, not '%s'", line);
> +
> +   o = parse_object();
> +   if (!o) {
> +   packet_write_fmt(1,
> +"ERR upload-pack: not our ref %s",
> +oid_to_hex());
> +   die("git upload-pack: not our ref %s",
> +   oid_to_hex());
> +   }
> +
> +   if (!(o->flags & WANTED)) {
> +   o->flags |= WANTED;
> +   add_object_array(o, NULL, _obj);
> +   }
> +
> +   return 1;
> +   }
> +
> +   return 0;
> +}
> +
> +static int parse_have(const char *line, struct oid_array *haves)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "have ", )) {
> +   struct object_id oid;
> +
> +   if (get_oid_hex(arg, ))
> +   die("git upload-pack: expected SHA1 object, got 
> '%s'", arg);
> +   oid_array_append(haves, );
> +   return 1;
> +

Re: [PATCH v5 13/34] directory rename detection: tests for handling overwriting untracked files

2018-01-03 Thread SZEDER Gábor

> ---
>  t/t6043-merge-rename-directories.sh | 337 
> 
>  1 file changed, 337 insertions(+)

> +test_expect_failure '10b-check: Overwrite untracked with dir rename + 
> delete' '
> + (
> + cd 10b &&
> +
> + git checkout A^0 &&
> + echo very >y/c &&
> + echo important >y/d &&
> + echo contents >y/e &&
> +
> + test_must_fail git merge -s recursive B^0 >out 2>err &&
> + test_i18ngrep "CONFLICT (rename/delete).*Version B^0 of y/d 
> left in tree at y/d~B^0" out &&
> + test_i18ngrep "Error: Refusing to lose untracked file at y/e; 
> writing to y/e~B^0 instead" out &&
> +
> + test 3 -eq $(git ls-files -s | wc -l) &&
> + test 2 -eq $(git ls-files -u | wc -l) &&
> + test 5 -eq $(git ls-files -o | wc -l) &&
> +
> + test $(git rev-parse :0:y/b) = $(git rev-parse O:z/b) &&

There is a test helper for that :)

  test_cmp_rev :0:y/b O:z/b

Note, that this is not only a matter of useful output on failure, but
also that of correctness and robustness.  A failing command inside
those command substitutions won't cause the whole command above to
fail, and if both 'git rev-parse' were to fail without writing
anything to stdout, the whole condition would still be fulfilled:

  $ test $(false) = $(false) && echo true
  true

I noticed that this patch series adds several similar

  test $(git hash-object this) = $(git rev-parse that)

conditions.  Well, for that we don't have a test helper
function.  Similar 'hash-object = rev-parse' comparisons are already
present in two other test scripts, so perhaps it's worth adding a
helper function.  Or you could perhaps

  git cat-file -p that >out &&
  test_cmp this out

I also noticed that all existing 'hash-object = rev-parse' conditions
came from you, so I would leave it up to you to decide which is easier
to work with and whether it's worth it.


> + test "very" = "$(cat y/c)" &&
> +
> + test "important" = "$(cat y/d)" &&

The 'verbose' helper could make conditions like these more, well,
verbose about their failure:

  verbose test "very" = "$(cat y/c)" &&

> + test "important" != "$(git rev-parse :3:y/d)" &&

I'm not sure what this condition is supposed to check.

I'm not particularly well versed in the intricacies of 'git rev-parse'
operating on different stages of the index, but to my understanding
'git rev-parse rev' either outputs the object name pointed by 'rev',
or 'rev' verbatim if that doesn't resolve to a valid object.  IOW, it
would never output "important" and the condition would always be
fulfilled.
What am I missing?

> + test $(git rev-parse :3:y/d) = $(git rev-parse O:z/c) &&
> +
> + test "contents" = "$(cat y/e)" &&
> + test "contents" != "$(git rev-parse :3:y/e)" &&
> + test $(git rev-parse :3:y/e) = $(git rev-parse B:z/e)
> + )
> +'


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread René Scharfe
[replying only to the list because emails to per...@pluto.rain.com
 are rejected by my mail server with the following error message:
 "Requested action not taken: mailbox unavailable
  invalid DNS MX or A/ resource record."]

Am 02.01.2018 um 01:32 schrieb Perry Hutchison:
> Ren?? Scharfe  wrote:
>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>> The ownership of files created by git-archive is always
>>> root:root. Add --owner and --group options which work
>>> like the GNU tar equivalent to allow overriding these
>>> defaults.
>> ... the receiver would need to be root to set both IDs, or be a
>> group member to set the group ID; I guess the latter is more common.
> 
> If the received files are owned by root:root as stated, I guess the
> receiver must be running as root, no?

That depends on what you mean with "must".  Users who want the files
they extract to be owned by root need root permissions on Unix and
Linux.  If they are OK with owning the files themselves then regular
user accounts suffice.  I assume the latter is much more common.

René


Re: [PATCH 0/2] Several fixes for the test suite related to spaces in filenames

2018-01-03 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Jan 03, 2018 at 05:54:44PM +0100, Johannes Schindelin wrote:
>
>> The second issue was found long ago, and the patch carried in Git for
>> Windows, although nothing about it is specific to Windows.
>> 
>> The first patch was developed today, when I tried to verify that Git's
>> test suite passes if Git is cloned to a directory called `with spaces/`.
>
> Heh, the whole point of the space in the trash directory was to find
> these issues early, but obviously it is not foolproof. :)

Exactly.

> The patches themselves look good to me from inspection. Thanks.

Yes, these changes look good.  Thanks both.


Re: [PATCH 12/26] ls-refs: introduce ls-refs server command

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> Introduce the ls-refs server command.  In protocol v2, the ls-refs
> command is used to request the ref advertisement from the server.  Since
> it is a command which can be requested (as opposed to mandatory in v1),
> a client can sent a number of parameters in its request to limit the ref
> advertisement based on provided ref-patterns.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/technical/protocol-v2.txt | 26 +
>  Makefile|  1 +
>  ls-refs.c   | 97 
> +
>  ls-refs.h   |  9 +++

Maybe consider putting any served command into a sub directory?

For example the code in builtin/ has laxer rules w.r.t. die()ing
as it is a user facing command, whereas some devs want to see
code at the root of the repo to not die() at all as the eventual goal
is to have a library there.
All this code is on the remote side, which also has different traits than
the code at the root of the git.git repo; non-localisation comes to mind,
but there might be other aspects as well (security?).


>  serve.c |  2 +
>  5 files changed, 135 insertions(+)
>  create mode 100644 ls-refs.c
>  create mode 100644 ls-refs.h
>
> diff --git a/Documentation/technical/protocol-v2.txt 
> b/Documentation/technical/protocol-v2.txt
> index b87ba3816..5f4d0e719 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -89,3 +89,29 @@ terminate the connection.
>  Commands are the core actions that a client wants to perform (fetch, push,
>  etc).  Each command will be provided with a list capabilities and
>  arguments as requested by a client.
> +
> + Ls-refs

So is it ls-refs or Ls-refs or is any capitalization valid?

> +-
> +
> +Ls-refs is the command used to request a reference advertisement in v2.
> +Unlike the current reference advertisement, ls-refs takes in parameters
> +which can be used to limit the refs sent from the server.
> +
> +Ls-ref takes in the following parameters wraped in packet-lines:
> +
> +  symrefs: In addition to the object pointed by it, show the underlying
> +  ref pointed by it when showing a symbolic ref.
> +  peel: Show peeled tags.
> +  ref-pattern : When specified, only references matching the
> +given patterns are displayed.

What kind of pattern matching is allowed here?
strictly prefix only, or globbing, regexes?
Is there a given grammar to follow? Maybe a link to the git
glossary is or somewhere else might be fine.

Seeing that we do wildmatch() down there (as opposed to regexes),
I wonder if it provides an entry for a denial of service attack, by crafting
a pattern that is very expensive for the server to compute but cheap to
ask for from a client. (c.f. 94da9193a6 (grep: add support for PCRE v2,
2017-06-01, but that is regexes!)

> +The output of ls-refs is as follows:
> +
> +output = *ref
> +flush-pkt
> +ref = PKT-LINE((tip | peeled) LF)
> +tip = obj-id SP refname (SP symref-target)
> +peeled = obj-id SP refname "^{}"
> +
> +symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
> +shallow = PKT-LINE("shallow" SP obj-id LF)
> diff --git a/Makefile b/Makefile
> index 5f3b5fe8b..152a73bec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -820,6 +820,7 @@ LIB_OBJS += list-objects-filter-options.o
>  LIB_OBJS += ll-merge.o
>  LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
> +LIB_OBJS += ls-refs.o
>  LIB_OBJS += mailinfo.o
>  LIB_OBJS += mailmap.o
>  LIB_OBJS += match-trees.o
> diff --git a/ls-refs.c b/ls-refs.c
> new file mode 100644
> index 0..ac4904a40
> --- /dev/null
> +++ b/ls-refs.c
> @@ -0,0 +1,97 @@
> +#include "cache.h"
> +#include "repository.h"
> +#include "refs.h"
> +#include "remote.h"
> +#include "argv-array.h"
> +#include "ls-refs.h"
> +#include "pkt-line.h"
> +
> +struct ls_refs_data {
> +   unsigned peel;
> +   unsigned symrefs;
> +   struct argv_array patterns;
> +};
> +
> +/*
> + * Check if one of the patterns matches the tail part of the ref.
> + * If no patterns were provided, all refs match.
> + */
> +static int ref_match(const struct argv_array *patterns, const char *refname)
> +{
> +   char *pathbuf;
> +   int i;
> +
> +   if (!patterns->argc)
> +   return 1; /* no restriction */
> +
> +   pathbuf = xstrfmt("/%s", refname);
> +   for (i = 0; i < patterns->argc; i++) {
> +   if (!wildmatch(patterns->argv[i], pathbuf, 0)) {
> +   free(pathbuf);
> +   return 1;
> +   }
> +   }
> +   free(pathbuf);
> +   return 0;
> +}
> +
> +static int send_ref(const char *refname, const struct object_id *oid,
> +   int flag, void *cb_data)
> +{
> +   struct ls_refs_data *data = 

Re: [RFC/PATCH] connect: add GIT_SSH_{SEND,RECEIVE}{,_COMMAND} env variables

2018-01-03 Thread Ævar Arnfjörð Bjarmason

On Wed, Jan 03 2018, Junio C. Hamano jotted:

> Ævar Arnfjörð Bjarmason   writes:
>
>> This is useful for talking to systems such as Github or Gitlab that
>> identify user accounts (or deploy keys) by ssh keys. Normally, ssh
>> could do this itself by supplying multiple keys via -i, but that trick
>> doesn't work on these systems as the connection will have already been
>> accepted when the "wrong" key gets rejected.
>
> You need to explain this a lot better than the above.
>
> I am sure systems such as Github have more than dozens of users who
> push over ssh and these users identify themselves by which key to
> use when establishing connection just fine (presumably by using a
> "Host" entry for the github URL in ~/.ssh/config), and presumably we
> are not sending "wrong" keys over there.  So there needs to be a lot
> more clear description of the problem you are trying to solve in the
> first place.

Hopefully this is clearer, and depending on how the rest of the
discussion goes I'll submit v2 with something like this in the commit
message:

SSH keys A and B are known to the remote service, and used to identify
two different users.

A can only push to repository X, and B can only fetch from repository Y.

Thus, if you have a script that does:

GIT_SSH_COMMAND="ssh -i A -i B" git ...

It'll always fail for pulling from X, and pushing to Y. Supply:

GIT_SSH_COMMAND="ssh -i B -i A" git ...

And now pulling will work, but pushing won't.

If you were to do, where C is a completly unknown key:

GIT_SSH_COMMAND="ssh -i C -i A" git push X ...

It would work, since ssh wouldn't get far enough in the key negotiation
to drop you into a shell. This is the case you had in mind, but is
unrelated to the problem I'm trying to address.

I tested this on a Gitlab instance, but as far as I know this property
is going to be intrinsic to anything that uses ssh in this way,
i.e. once you get past the step where the server says "this key is OK"
and drops you into a shell, it's not going to retry the whole
negotiation with another key just because the command you ran exited
with non-zero.

So now I just have a GIT_SSH_COMMAND that dispatches to different keys
depending on the operation, as noted in the commit message, and I can
assure you that without that logic it doesn't work.

I thought that use-case might be useful enough to be natively supported,
since right now you either need to hack it up like that, or perform
similar hacks with url/pushurl and ssh host aliases in your config.


[no subject]

2018-01-03 Thread Jalus Bilieyich


Re: [PATCH] doc/SubmittingPatches: improve text formatting

2018-01-03 Thread brian m. carlson
On Tue, Jan 02, 2018 at 10:33:50AM -0500, Todd Zullinger wrote:
> 049e64aa50 ("Documentation: convert SubmittingPatches to AsciiDoc",
> 2017-11-12) changed the `git blame` and `git shortlog` examples given in
> the section on sending your patches.
> 
> In order to italicize the `$path` argument the commands are enclosed in
> plus characters as opposed to backticks.  The difference between the
> quoting methods is that backtick enclosed text is not subject to further
> expansion.  This formatting makes reading SubmittingPatches in a git
> clone a little more difficult.  In addition to the underscores around
> `$path` the `--` chars in `git shortlog --no-merges` must be replaced
> with `{litdd}`.
> 
> Use backticks to quote these commands.  The italicized `$path` is lost
> from the html version but the commands can be read (and copied) more
> easily by users reading the text version.  These readers are more likely
> to use the commands while submitting patches.  Make it easier for them.

I think this change is fine.  I don't have a strong opinion either way
and if others think the change makes the plain text more readable, I'm
all for it.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Misleading documentation for git-diff-files (diff-filter)

2018-01-03 Thread Junio C Hamano
John Cheng  writes:

> I wanted to know if git diff-files shows files that are not in the
> index but are in the working tree.

At least in the original design of Git, that would fundamentally be
impossible, as Git _only_ cares about paths that are in the index,
so a new file won't be in the picture until it is added.  Because a
change is shown as "A"dded by the diff family of commands only when
the old side lacks a path that appears in the new side, there is no
way "diff-files" that compares the index and the working tree would
see a path that is missing from the old (i.e. the index) side.


Re: [RFC/PATCH] connect: add GIT_SSH_{SEND,RECEIVE}{,_COMMAND} env variables

2018-01-03 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> This is useful for talking to systems such as Github or Gitlab that
> identify user accounts (or deploy keys) by ssh keys. Normally, ssh
> could do this itself by supplying multiple keys via -i, but that trick
> doesn't work on these systems as the connection will have already been
> accepted when the "wrong" key gets rejected.

You need to explain this a lot better than the above.  

I am sure systems such as Github have more than dozens of users who
push over ssh and these users identify themselves by which key to
use when establishing connection just fine (presumably by using a
"Host" entry for the github URL in ~/.ssh/config), and presumably we
are not sending "wrong" keys over there.  So there needs to be a lot
more clear description of the problem you are trying to solve in the
first place.


Re: Bug report: git clone with dest

2018-01-03 Thread Isaac Shabtay
Indeed interesting... this one's for the books...
Thanks for the patches. Any idea when these are going to make it to
the official Git client builds? (specifically the Windows one)

On 3 January 2018 at 14:28, Jeff King  wrote:
> On Wed, Jan 03, 2018 at 12:59:48PM -0800, Isaac Shabtay wrote:
>
>> Target directory is deleted on clone failures.
>>
>> Steps to reproduce, for example on Windows:
>>
>> cd /d %TEMP%
>> mkdir dest
>> git clone https://some-fake-url/whatever-makes-git-clone-fail dest
>>
>> Of course, the clone will fail as it should. But looks like the Git
>> client also ends up deleting the "dest" directory.
>
> Interesting. AFAICT Git has behaved this way for almost 9 years, and now
> we have two reports in two days. Serendipity, or did something else
> change? :)
>
> Anyway, you might be interested in the patch series I posted yesterday:
>
>   https://public-inbox.org/git/20180102210753.ga10...@sigill.intra.peff.net/
>
> -Peff


Re: Bug report: git clone with dest

2018-01-03 Thread Jeff King
On Wed, Jan 03, 2018 at 12:59:48PM -0800, Isaac Shabtay wrote:

> Target directory is deleted on clone failures.
> 
> Steps to reproduce, for example on Windows:
> 
> cd /d %TEMP%
> mkdir dest
> git clone https://some-fake-url/whatever-makes-git-clone-fail dest
> 
> Of course, the clone will fail as it should. But looks like the Git
> client also ends up deleting the "dest" directory.

Interesting. AFAICT Git has behaved this way for almost 9 years, and now
we have two reports in two days. Serendipity, or did something else
change? :)

Anyway, you might be interested in the patch series I posted yesterday:

  https://public-inbox.org/git/20180102210753.ga10...@sigill.intra.peff.net/

-Peff


Re: [PATCH v5 00/34] Add directory rename detection to git

2018-01-03 Thread Johannes Sixt

Am 03.01.2018 um 22:02 schrieb Elijah Newren:

On Wed, Jan 3, 2018 at 2:57 AM, Johannes Sixt  wrote:


I tested the series on Windows recently. It requires the patch below.
I don't know whether this is indicating some portability issues of grep
(^ being used in the middle of a RE instead of at the very beginning) or
just a quirk in my setup.


Thanks for testing it out.  What version of Windows were you running
on?  With cygwin or without?  I tested previously on cygwin (I think
on Windows Server 2012??) and got all the tests passing there,
eventually[1].  I'm not sure I can find access to any other Windows
systems, but I'd be happy to take a look if I can.

[1] 
https://public-inbox.org/git/cabpp-bej6-mry0ocz1wwetrtg_iehkzodcuon_puukvywau...@mail.gmail.com/


I have an ancient MinGW setup, where I build "vanilla" Git (not exactly 
vanilla, but also not with the many patches that Git for Windows carries).



The need to backslash escape a caret for a literal match when it
appears in the middle of the string makes sense.  Thanks for sending
along the patch.  Would you prefer I squashed it into the series
(still sitting in 'pu'), or keep your patch separate?  I'm fine with
either, I'm just unsure the protocol here.


Please squash into the relevant commits so that the series is bisectable 
if the need arises.



But it still does not pass the test suite because the system does not
like file names such as y/c~HEAD:

++ grep 'Refusing to lose dirty file at z/c' out
Refusing to lose dirty file at z/c
++ grep -q stuff x/b y/a y/c y/c~HEAD z/c
grep: y/c: Invalid request code
error: last command exited with $?=2
not ok 94 - 11d-check: Avoid losing not-uptodate with rename + D/F conflict


This is exceptionally odd.  The actual line from the testsuite was
   grep -q stuff */*

which suggests your shell is both doing the pathname expansion and
treating the resulting filename not as a string but as something to be
interpreted that happens to have some kind of special
characters/commands, and then choking on the result.  Super weird.  I
could probably work around this by just running
   grep -q stuff z/c

I think I had the asterisks in there because I was thinking in terms
of directory rename detection potentially moving the file, but that's
probably just overkill.  Does the test pass for you with that change?


I can test on Monday at the earliest. If it's that easy to fix my 
failures, I'd appreciate to go this route. But otherwise, I can deal 
with the situation, so we don't need to complicate things just to please 
my exotic setup.



(If so, there are also two similar tests that I'd need to make similar
changes to.)

However, although that might fix this particular case, it suggests
some fragility of the tests and filenames for whatever system you
happen to be using.  merge-recursive.c's unique_path has created
filenames with tilde's in them for many years, it may just be that I'm
the first to use the resulting file in combination with grep to ensure
the contents are as we expect.  There may be other issues lurking
(even if not yet appearing in the testsuite) for your system when
dealing with merge conflicts.


I can't recall having seen issues around tildas in file names, either. 
It may be a new situation. I'll investigate.


-- Hannes


Re: [PATCH 5/5] diff: properly error out when combining multiple pickaxe options

2018-01-03 Thread Stefan Beller
On Wed, Jan 3, 2018 at 2:08 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
> ;
>> + count = 0;
>> +
>> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_S)
>> + count++;
>> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_G)
>> + count++;
>> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_OBJFIND)
>> + count++;
>> + if (count > 1)
>> + die(_("-G, -S, --find-object are mutually exclusive"));
>
> I thought the reason you defined pickaxe-kind bitmask was so that
> you can mask this field to grab these (and only these) bits.

Originally I only wanted to mask out the 'case independency'
bit and keep it future proof for any similar bits.

> Once you have that mask, you should be able to use HAS_MULTI_BITS()
> on the masked result without counting, no?

Oh, what a nice macro! Thanks for pointing at it.

As soon as I figured out the right place where to put this check,
I saw the lines above, whose style I imitated.
(I guess there is just no mask defined for "--name-only, --name-status,
--check and -s", nor would it make sense to do so; though that given
macro should work just fine even for non-continuous masks)


Re: [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index

2018-01-03 Thread Thomas Gummerer
[sorry for the late reply.  I was on Christmas holidays until today
and am still catching up on the mailing list.  It will probably take
me untill the weekend to send a re-roll]

On 12/18, Brandon Williams wrote:
> On 12/17, Thomas Gummerer wrote:
> > be489d02d2 ("revision.c: --indexed-objects add objects from all
> > worktrees", 2017-08-23) made sure that pruning takes objects from all
> > worktrees into account.
> > 
> > It did that by reading the index of every worktree and adding the
> > necessary index objects to the set of pending objects.  The index is
> > read by read_index_from.  As mentioned in the previous commit,
> > read_index_from depends on the CWD for the location of the split index,
> 
> As I mentioned before this doesn't actually depend on the CWD but
> rather the per-worktree gitdir.

Right, will fix.

> > and add_index_objects_to_pending doesn't set that before using
> > read_index_from.
> > 
> > Instead of using read_index_from, use repo_read_index, which is aware of
> > the proper paths for the worktree.
> > 
> > This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> > 
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  repository.c | 11 +++
> >  repository.h |  2 ++
> >  revision.c   | 14 +-
> >  3 files changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/repository.c b/repository.c
> > index 928b1f553d..3c9bfbd1b8 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -2,6 +2,7 @@
> >  #include "repository.h"
> >  #include "config.h"
> >  #include "submodule-config.h"
> > +#include "worktree.h"
> >  
> >  /* The main repository */
> >  static struct repository the_repo = {
> > @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char 
> > *gitdir, const char *worktree)
> > return -1;
> >  }
> >  
> > +/*
> > + * Initialize 'repo' based on the provided worktree
> > + * Return 0 upon success and a non-zero value upon failure.
> > + */
> > +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> > +{
> > +   return repo_init(repo, get_worktree_git_dir(worktree),
> > +worktree->path);
> 
> I still feel very unsettled about this and don't think its a good idea.
> get_worktree_git_dir depends implicitly on the global the_repository
> object and I would like to avoid relying on it for an initialization
> function like this.
> 
> > +}
> > +
> >  /*
> >   * Initialize 'submodule' as the submodule given by 'path' in parent 
> > repository
> >   * 'superproject'.
> > diff --git a/repository.h b/repository.h
> > index 7f5e24a0a2..2adeb05bf4 100644
> > --- a/repository.h
> > +++ b/repository.h
> > @@ -4,6 +4,7 @@
> >  struct config_set;
> >  struct index_state;
> >  struct submodule_cache;
> > +struct worktree;
> >  
> >  struct repository {
> > /* Environment */
> > @@ -87,6 +88,7 @@ extern struct repository *the_repository;
> >  extern void repo_set_gitdir(struct repository *repo, const char *path);
> >  extern void repo_set_worktree(struct repository *repo, const char *path);
> >  extern int repo_init(struct repository *repo, const char *gitdir, const 
> > char *worktree);
> > +extern int repo_worktree_init(struct repository *repo, struct worktree 
> > *worktree);
> >  extern int repo_submodule_init(struct repository *submodule,
> >struct repository *superproject,
> >const char *path);
> > diff --git a/revision.c b/revision.c
> > index e2e691dd5a..34e1e4b799 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -22,6 +22,7 @@
> >  #include "packfile.h"
> >  #include "worktree.h"
> >  #include "argv-array.h"
> > +#include "repository.h"
> >  
> >  volatile show_early_output_fn_t show_early_output;
> >  
> > @@ -1346,15 +1347,18 @@ void add_index_objects_to_pending(struct rev_info 
> > *revs, unsigned int flags)
> > worktrees = get_worktrees(0);
> > for (p = worktrees; *p; p++) {
> > struct worktree *wt = *p;
> > -   struct index_state istate = { NULL };
> > +   struct repository *repo;
> >  
> > +   repo = xmalloc(sizeof(struct repository));
> > if (wt->is_current)
> > continue; /* current index already taken care of */
> > +   if (repo_worktree_init(repo, wt))
> > +   BUG("couldn't initialize repository object from 
> > worktree");
> >  
> > -   if (read_index_from(,
> > -   worktree_git_path(wt, "index")) > 0)
> 
> Ok, after thinking this through a bit more I think a better approach may
> be to restructure the call to read_index_from to take in both an index
> file as well as the explicit gitdir to use when constructing a path to
> the sharedindex file.  That way you can fix this for worktrees and
> submodules without having to pass in a repository object to the logic
> which is reading an index file as well as avoiding needing to init a
> repository object for every 

Re: [PATCH 5/5] diff: properly error out when combining multiple pickaxe options

2018-01-03 Thread Junio C Hamano
Stefan Beller  writes:

;
> + count = 0;
> +
> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_S)
> + count++;
> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_G)
> + count++;
> + if (options->pickaxe_opts & DIFF_PICKAXE_KIND_OBJFIND)
> + count++;
> + if (count > 1)
> + die(_("-G, -S, --find-object are mutually exclusive"));

I thought the reason you defined pickaxe-kind bitmask was so that
you can mask this field to grab these (and only these) bits.
Once you have that mask, you should be able to use HAS_MULTI_BITS()
on the masked result without counting, no?


Re: [PATCH 3/5] diff: introduce DIFF_PICKAXE_KINDS_MASK

2018-01-03 Thread Junio C Hamano
Stefan Beller  writes:

> Currently the check whether to perform pickaxing is done via checking
> `diffopt->pickaxe`, which contains the command line argument that we
> want to pickaxe for. Soon we'll introduce a new type of pickaxing, that
> will not store anything in the `.pickaxe` field, so let's migrate the
> check to be dependent on pickaxe_opts.
>
> It is not enough to just replace the check for pickaxe by pickaxe_opts,
> because flags might be set, but pickaxing was not requested ('-i').
> To cope with that, introduce a mask to check only for the bits indicating
> the modes of operation.

The resulting code after this series would leave a few "huh?" if it
were new code, but the series is not making anything worse, so take
this as just something noticed, not as something needs further work.

Because we do not allow "log -S -G", there is
no legitimate reason why they have to be a bit in the pickaxe_opts
flag word.  A single enum that says "We are doing pickaxe search and
_this_ is the kind of pickaxe search we are doing" would suffice,
i.e. the NULL-ness check of rev->diffopt.pickaxe string can be
replaced with a check of that enum field against PICKAXE_NONE or
something that signals us that no pickaxe is in effect.

On the other hand, if somebody comes up with a sensible way to
combine more than one pickaxe queries in a single traversal
(e.g. "log -S -S" might mean "find a
change that loses or gains  and  in the
same commit", or it may mean the same with "... or
"), then a more sensible data structure to represent
the pickaxe request may have been a list of struct, each of which
records the kind and the parameter (e.g. "-S" and ""
would be in a single struct, and "-S" and "" would be
in another, and these two are in the list that is diffopt->pickaxe).
The NULL-ness check of rev->diffopt.pickaxe string would be replaced
with a check of the length of that list.

In any case, this step looks sensible.

>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/log.c  | 4 ++--
>  combine-diff.c | 2 +-
>  diff.c | 4 ++--
>  diff.h | 2 ++
>  revision.c | 2 +-
>  5 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index 6c1fa896ad..bd6f2d1efb 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -180,8 +180,8 @@ static void cmd_log_init_finish(int argc, const char 
> **argv, const char *prefix,
>   if (rev->show_notes)
>   init_display_notes(>notes_opt);
>  
> - if (rev->diffopt.pickaxe || rev->diffopt.filter ||
> - rev->diffopt.flags.follow_renames)
> + if ((rev->diffopt.pickaxe_opts & DIFF_PICKAXE_KINDS_MASK) ||
> + rev->diffopt.filter || rev->diffopt.flags.follow_renames)
>   rev->always_show_header = 0;
>  
>   if (source)
> diff --git a/combine-diff.c b/combine-diff.c
> index 2505de119a..bc08c4c5b1 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1438,7 +1438,7 @@ void diff_tree_combined(const struct object_id *oid,
>   opt->flags.follow_renames   ||
>   opt->break_opt != -1||
>   opt->detect_rename  ||
> - opt->pickaxe||
> + (opt->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK)   ||
>   opt->filter;
>  
>  
> diff --git a/diff.c b/diff.c
> index 0763e89263..5508745dc8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4173,7 +4173,7 @@ void diff_setup_done(struct diff_options *options)
>   /*
>* Also pickaxe would not work very well if you do not say recursive
>*/
> - if (options->pickaxe)
> + if (options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK)
>   options->flags.recursive = 1;
>   /*
>* When patches are generated, submodules diffed against the work tree
> @@ -5777,7 +5777,7 @@ void diffcore_std(struct diff_options *options)
>   if (options->break_opt != -1)
>   diffcore_merge_broken();
>   }
> - if (options->pickaxe)
> + if (options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK)
>   diffcore_pickaxe(options);
>   if (options->orderfile)
>   diffcore_order(options->orderfile);
> diff --git a/diff.h b/diff.h
> index 8af1213684..9ec4f824fe 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -326,6 +326,8 @@ extern void diff_setup_done(struct diff_options *);
>  #define DIFF_PICKAXE_KIND_S  4 /* traditional plumbing counter */
>  #define DIFF_PICKAXE_KIND_G  8 /* grep in the patch */
>  
> +#define DIFF_PICKAXE_KINDS_MASK (DIFF_PICKAXE_KIND_S | DIFF_PICKAXE_KIND_G)
> +
>  #define DIFF_PICKAXE_IGNORE_CASE 32
>  
>  extern void diffcore_std(struct diff_options *);
> diff --git a/revision.c b/revision.c
> index ccf1d212ce..5d11ecaf27 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2407,7 +2407,7 @@ int setup_revisions(int argc, const char **argv, struct 
> rev_info *revs, struct s
>   

Re: [PATCH 1/5] diff.h: Make pickaxe_opts an unsigned bit field

2018-01-03 Thread Junio C Hamano
Stefan Beller  writes:

> This variable is used as a bit field[1], and as we are about to add more
> fields, indicate its usage as a bit field by making it unsigned.
>
> [1] containing the bits
>
> #define DIFF_PICKAXE_ALL  1
> #define DIFF_PICKAXE_REGEX2
> #define DIFF_PICKAXE_KIND_S   4
> #define DIFF_PICKAXE_KIND_G   8
>
> Signed-off-by: Stefan Beller 
> ---
>  diff.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Makes perfect sense.

>
> diff --git a/diff.h b/diff.h
> index 0fb18dd735..ea310f76fd 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -146,7 +146,7 @@ struct diff_options {
>   int skip_stat_unmatch;
>   int line_termination;
>   int output_format;
> - int pickaxe_opts;
> + unsigned pickaxe_opts;
>   int rename_score;
>   int rename_limit;
>   int needed_rename_limit;


[PATCH v3 4/5] status: support --no-ahead-behind in long format

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach long (normal) status format to respect the --no-ahead-behind
parameter and skip the possibly expensive ahead/behind computation
between the branch and the upstream.

Long status also respects "status.aheadBehind" config setting.

Signed-off-by: Jeff Hostetler 
---
 builtin/checkout.c   |  2 +-
 remote.c | 18 +-
 remote.h |  3 ++-
 t/t6040-tracking-info.sh | 47 +++
 wt-status.c  |  2 +-
 5 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index fc4f8fd..655dac2 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -605,7 +605,7 @@ static void report_tracking(struct branch_info *new)
struct strbuf sb = STRBUF_INIT;
struct branch *branch = branch_get(new->name);
 
-   if (!format_tracking_info(branch, ))
+   if (!format_tracking_info(branch, , AHEAD_BEHIND_FULL))
return;
fputs(sb.buf, stdout);
strbuf_release();
diff --git a/remote.c b/remote.c
index 32706bc..bbbe3e5 100644
--- a/remote.c
+++ b/remote.c
@@ -2068,15 +2068,16 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
 /*
  * Return true when there is anything to report, otherwise false.
  */
-int format_tracking_info(struct branch *branch, struct strbuf *sb)
+int format_tracking_info(struct branch *branch, struct strbuf *sb,
+enum ahead_behind_flags abf)
 {
-   int ours, theirs;
+   int ours, theirs, sti;
const char *full_base;
char *base;
int upstream_is_gone = 0;
 
-   if (stat_tracking_info(branch, , , _base,
-  AHEAD_BEHIND_FULL) < 0) {
+   sti = stat_tracking_info(branch, , , _base, abf);
+   if (sti < 0) {
if (!full_base)
return 0;
upstream_is_gone = 1;
@@ -2090,10 +2091,17 @@ int format_tracking_info(struct branch *branch, struct 
strbuf *sb)
if (advice_status_hints)
strbuf_addstr(sb,
_("  (use \"git branch --unset-upstream\" to 
fixup)\n"));
-   } else if (!ours && !theirs) {
+   } else if (!sti) {
strbuf_addf(sb,
_("Your branch is up to date with '%s'.\n"),
base);
+   } else if (abf == AHEAD_BEHIND_QUICK) {
+   strbuf_addf(sb,
+   _("Your branch and '%s' refer to different 
commits.\n"),
+   base);
+   if (advice_status_hints)
+   strbuf_addf(sb, _("  (use \"%s\" for details)\n"),
+   "git status --ahead-behind");
} else if (!theirs) {
strbuf_addf(sb,
Q_("Your branch is ahead of '%s' by %d commit.\n",
diff --git a/remote.h b/remote.h
index 27feb63..b2fa5cc 100644
--- a/remote.h
+++ b/remote.h
@@ -265,7 +265,8 @@ enum ahead_behind_flags {
 /* Reporting of tracking info */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
   const char **upstream_name, enum ahead_behind_flags abf);
-int format_tracking_info(struct branch *branch, struct strbuf *sb);
+int format_tracking_info(struct branch *branch, struct strbuf *sb,
+enum ahead_behind_flags abf);
 
 struct ref *get_local_heads(void);
 /*
diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index 053dff3..febf63f 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -173,6 +173,53 @@ test_expect_success 'status.aheadbehind=false status -s -b 
(diverged from upstre
 '
 
 cat >expect <<\EOF
+On branch b1
+Your branch and 'origin/master' have diverged,
+and have 1 and 1 different commits each, respectively.
+EOF
+
+test_expect_success 'status --long --branch' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status --long -b | head -3
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+test_expect_success 'status --long --branch' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git -c status.aheadbehind=true status --long -b | head -3
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
+On branch b1
+Your branch and 'origin/master' refer to different commits.
+EOF
+
+test_expect_success 'status --long --branch --no-ahead-behind' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status --long -b --no-ahead-behind | head -2
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+test_expect_success 'status.aheadbehind=false status --long --branch' '
+   (
+   cd test &&
+   git 

[PATCH v3 5/5] status: add status.aheadBehind value for porcelain output

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

Add status.aheadBehind=2 value to enable --no-ahead-behind
for all formats (both porcelain and non-porcelain).  The
current boolean values only affect non-porcelain formats.

Signed-off-by: Jeff Hostetler 
---
 Documentation/config.txt |  5 +
 builtin/commit.c | 31 +--
 remote.h |  8 
 t/t6040-tracking-info.sh |  9 +
 t/t7064-wtstatus-pv2.sh  |  6 +-
 5 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index affb0d6..eaa1058 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3040,6 +3040,11 @@ status.aheadBehind::
--no-ahead-behind by default in linkgit:git-status[1] for
non-porcelain formats.  This setting is ignored by porcelain
formats for backwards compatibility.
++
+(EXPERIMENTAL) Set to 2 to allow both porcelain and non-porcelain
+formats to inherit --no-ahead-behind.  This may break backward
+compatibility for scripts using porcelain status formats and expecting
+ahead/behind information in the output.
 
 status.displayCommentPrefix::
If set to true, linkgit:git-status[1] will insert a comment
diff --git a/builtin/commit.c b/builtin/commit.c
index 416fe2c..194a6eb 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1109,13 +1109,32 @@ static const char *read_commit_message(const char *name)
 static struct status_deferred_config {
enum wt_status_format status_format;
int show_branch;
-   enum ahead_behind_flags ahead_behind;
+   enum ahead_behind_config_flags ahead_behind_config;
 } status_deferred_config = {
STATUS_FORMAT_UNSPECIFIED,
-1, /* unspecified */
AHEAD_BEHIND_UNSPECIFIED,
 };
 
+static inline enum ahead_behind_flags inherit_deferred_ab_flags(
+   int is_porcelain)
+{
+   switch (status_deferred_config.ahead_behind_config) {
+   case AHEAD_BEHIND_CONFIG_UNSPECIFIED:
+   case AHEAD_BEHIND_CONFIG_FULL:
+   return AHEAD_BEHIND_FULL;
+
+   case AHEAD_BEHIND_CONFIG_QUICK2:
+   return AHEAD_BEHIND_QUICK;
+
+   case AHEAD_BEHIND_CONFIG_QUICK:
+   return is_porcelain ? AHEAD_BEHIND_FULL : AHEAD_BEHIND_QUICK;
+
+   default: /* don't complain about bogus config settings */
+   return AHEAD_BEHIND_FULL;
+   }
+}
+
 static void finalize_deferred_config(struct wt_status *s)
 {
int use_deferred_config = (status_format != STATUS_FORMAT_PORCELAIN &&
@@ -1140,11 +1159,9 @@ static void finalize_deferred_config(struct wt_status *s)
if (s->show_branch < 0)
s->show_branch = 0;
 
-   if (use_deferred_config &&
-   s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
-   s->ahead_behind_flags = status_deferred_config.ahead_behind;
if (s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
-   s->ahead_behind_flags = AHEAD_BEHIND_FULL;
+   s->ahead_behind_flags =
+   inherit_deferred_ab_flags(!use_deferred_config);
 }
 
 static int parse_and_validate_options(int argc, const char *argv[],
@@ -1306,7 +1323,9 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
return 0;
}
if (!strcmp(k, "status.aheadbehind")) {
-   status_deferred_config.ahead_behind = git_config_bool(k, v);
+   int is_bool;
+   status_deferred_config.ahead_behind_config =
+   git_config_bool_or_int(k, v, _bool);
return 0;
}
if (!strcmp(k, "status.showstash")) {
diff --git a/remote.h b/remote.h
index b2fa5cc..bcf846a 100644
--- a/remote.h
+++ b/remote.h
@@ -262,6 +262,14 @@ enum ahead_behind_flags {
AHEAD_BEHIND_FULL=  1,  /* traditional a/b reporting */
 };
 
+/* Flags for status.aheadBehind values. */
+enum ahead_behind_config_flags {
+   AHEAD_BEHIND_CONFIG_UNSPECIFIED = -1,
+   AHEAD_BEHIND_CONFIG_QUICK   =  0, /* eq/neq for non-porcelain only 
*/
+   AHEAD_BEHIND_CONFIG_FULL=  1, /* a/b reporting for all formats 
*/
+   AHEAD_BEHIND_CONFIG_QUICK2  =  2, /* eq/neq for all formats */
+};
+
 /* Reporting of tracking info */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
   const char **upstream_name, enum ahead_behind_flags abf);
diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index febf63f..5003366 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -219,6 +219,15 @@ test_expect_success 'status.aheadbehind=false status 
--long --branch' '
test_i18ncmp expect actual
 '
 
+test_expect_success 'status.aheadbehind=2 status --long --branch' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git -c status.aheadbehind=false status 

[PATCH v3 2/5] status: add --[no-]ahead-behind to status and commit for V2 format.

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach "git status" and "git commit" to accept "--no-ahead-behind"
and "--ahead-behind" arguments to request quick or full ahead/behind
reporting.

When "--no-ahead-behind" is given, the existing porcelain V2 line
"branch.ab x y" is replaced with a new "branch equal eq|neq" line.
This indicates that the branch and its upstream are or are not equal
without the expense of computing the full ahead/behind values.

Added "status.aheadBehind" config setting.  This is only used by
non-porcelain format for backward-compatibility.

Signed-off-by: Jeff Hostetler 
---
 Documentation/config.txt |  6 
 Documentation/git-status.txt |  5 
 builtin/commit.c | 18 +++-
 remote.c |  2 ++
 remote.h |  5 ++--
 t/t7064-wtstatus-pv2.sh  | 69 
 wt-status.c  | 27 +
 wt-status.h  |  2 ++
 8 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9593bfa..affb0d6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3035,6 +3035,12 @@ status.branch::
Set to true to enable --branch by default in linkgit:git-status[1].
The option --no-branch takes precedence over this variable.
 
+status.aheadBehind::
+   Set to true to enable --ahead-behind and to false to enable
+   --no-ahead-behind by default in linkgit:git-status[1] for
+   non-porcelain formats.  This setting is ignored by porcelain
+   formats for backwards compatibility.
+
 status.displayCommentPrefix::
If set to true, linkgit:git-status[1] will insert a comment
prefix before each output line (starting with
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a..603bf40 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -111,6 +111,11 @@ configuration variable documented in linkgit:git-config[1].
without options are equivalent to 'always' and 'never'
respectively.
 
+--ahead-behind::
+--no-ahead-behind::
+   Display or do not display detailed ahead/behind counts for the
+   branch relative to its upstream branch.  Defaults to true.
+
 ...::
See the 'pathspec' entry in linkgit:gitglossary[7].
 
diff --git a/builtin/commit.c b/builtin/commit.c
index be370f6..416fe2c 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1109,9 +1109,11 @@ static const char *read_commit_message(const char *name)
 static struct status_deferred_config {
enum wt_status_format status_format;
int show_branch;
+   enum ahead_behind_flags ahead_behind;
 } status_deferred_config = {
STATUS_FORMAT_UNSPECIFIED,
-   -1 /* unspecified */
+   -1, /* unspecified */
+   AHEAD_BEHIND_UNSPECIFIED,
 };
 
 static void finalize_deferred_config(struct wt_status *s)
@@ -1137,6 +1139,12 @@ static void finalize_deferred_config(struct wt_status *s)
s->show_branch = status_deferred_config.show_branch;
if (s->show_branch < 0)
s->show_branch = 0;
+
+   if (use_deferred_config &&
+   s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
+   s->ahead_behind_flags = status_deferred_config.ahead_behind;
+   if (s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
+   s->ahead_behind_flags = AHEAD_BEHIND_FULL;
 }
 
 static int parse_and_validate_options(int argc, const char *argv[],
@@ -1297,6 +1305,10 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
status_deferred_config.show_branch = git_config_bool(k, v);
return 0;
}
+   if (!strcmp(k, "status.aheadbehind")) {
+   status_deferred_config.ahead_behind = git_config_bool(k, v);
+   return 0;
+   }
if (!strcmp(k, "status.showstash")) {
s->show_stash = git_config_bool(k, v);
return 0;
@@ -1351,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
 N_("show branch information")),
OPT_BOOL(0, "show-stash", _stash,
 N_("show stash information")),
+   OPT_BOOL(0, "ahead-behind", _behind_flags,
+N_("compute full ahead/behind values")),
{ OPTION_CALLBACK, 0, "porcelain", _format,
  N_("version"), N_("machine-readable output"),
  PARSE_OPT_OPTARG, opt_parse_porcelain },
@@ -1628,6 +1642,8 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
OPT_SET_INT(0, "short", _format, N_("show status 
concisely"),
STATUS_FORMAT_SHORT),
OPT_BOOL(0, "branch", _branch, N_("show branch 
information")),
+   OPT_BOOL(0, "ahead-behind", 

[PATCH v3 3/5] status: update short status to respect --no-ahead-behind

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach "git status --short --branch" to respect "--no-ahead-behind"
parameter to skip computing ahead/behind counts for the branch and
its upstream and just report '[different]'.

Short status also respect the "status.aheadBehind" config setting.

Signed-off-by: Jeff Hostetler 
---
 t/t6040-tracking-info.sh | 26 ++
 wt-status.c  | 11 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index 8f17fd9..053dff3 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -147,6 +147,32 @@ test_expect_success 'status -s -b (diverged from 
upstream)' '
 '
 
 cat >expect <<\EOF
+## b1...origin/master [different]
+EOF
+
+test_expect_success 'status -s -b --no-ahead-behind (diverged from upstream)' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status -s -b --no-ahead-behind | head -1
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
+## b1...origin/master [different]
+EOF
+
+test_expect_success 'status.aheadbehind=false status -s -b (diverged from 
upstream)' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git -c status.aheadbehind=false status -s -b | head -1
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
 ## b5...brokenbase [gone]
 EOF
 
diff --git a/wt-status.c b/wt-status.c
index 3959d31..df6cc33 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1766,7 +1766,7 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
const char *base;
char *short_base;
const char *branch_name;
-   int num_ours, num_theirs;
+   int num_ours, num_theirs, sti;
int upstream_is_gone = 0;
 
color_fprintf(s->fp, color(WT_STATUS_HEADER, s), "## ");
@@ -1792,8 +1792,9 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
 
color_fprintf(s->fp, branch_color_local, "%s", branch_name);
 
-   if (stat_tracking_info(branch, _ours, _theirs, ,
-  AHEAD_BEHIND_FULL) < 0) {
+   sti = stat_tracking_info(branch, _ours, _theirs, ,
+s->ahead_behind_flags);
+   if (sti < 0) {
if (!base)
goto conclude;
 
@@ -1805,12 +1806,14 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
color_fprintf(s->fp, branch_color_remote, "%s", short_base);
free(short_base);
 
-   if (!upstream_is_gone && !num_ours && !num_theirs)
+   if (!upstream_is_gone && !sti)
goto conclude;
 
color_fprintf(s->fp, header_color, " [");
if (upstream_is_gone) {
color_fprintf(s->fp, header_color, LABEL(N_("gone")));
+   } else if (s->ahead_behind_flags == AHEAD_BEHIND_QUICK) {
+   color_fprintf(s->fp, header_color, LABEL(N_("different")));
} else if (!num_ours) {
color_fprintf(s->fp, header_color, LABEL(N_("behind ")));
color_fprintf(s->fp, branch_color_remote, "%d", num_theirs);
-- 
2.9.3



[PATCH v3 0/5] Add --no-ahead-behind to status

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

This is version 3 of my patch series to avoid expensive
ahead/behind calculations in status.  This version tries
to address most of the comments in V2.

I've switched back to a "status.aheadBehind" config setting
rather than in "core.*".  This has been better integrated
with the existing status_deferred_config mechanism in
builtin/commit.c and lets both status and commit inherit it.

Config values of true and false control non-porcelain formats
for compatibility reasons as previously discussed.  In the
last commit I added a new value of 2 for the config setting
to allow porcelain formats to inherit the new setting.  I've
marked this experimental for now or so that we can discuss
it.

Jeff Hostetler (5):
  stat_tracking_info: return +1 when branches not equal
  status: add --[no-]ahead-behind to status and commit for V2 format.
  status: update short status to respect --no-ahead-behind
  status: support --no-ahead-behind in long format
  status: add status.aheadBehind value for porcelain output

 Documentation/config.txt | 11 ++
 Documentation/git-status.txt |  5 +++
 builtin/checkout.c   |  2 +-
 builtin/commit.c | 37 +++-
 ref-filter.c |  8 ++---
 remote.c | 42 +--
 remote.h | 20 +--
 t/t6040-tracking-info.sh | 82 
 t/t7064-wtstatus-pv2.sh  | 73 +++
 wt-status.c  | 38 +++-
 wt-status.h  |  2 ++
 11 files changed, 292 insertions(+), 28 deletions(-)

-- 
2.9.3



[PATCH v3 1/5] stat_tracking_info: return +1 when branches not equal

2018-01-03 Thread Jeff Hostetler
From: Jeff Hostetler 

Extend stat_tracking_info() to return +1 when branches are not equal and to
take a new "enum ahead_behind_flags" argument to allow skipping the (possibly
expensive) ahead/behind computation.

This will be used in the next commit to allow "git status" to avoid full
ahead/behind calculations for performance reasons.

Signed-off-by: Jeff Hostetler 
---
 ref-filter.c |  8 
 remote.c | 26 ++
 remote.h |  8 +++-
 wt-status.c  |  6 --
 4 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index e728b15..23bcdc4 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1238,8 +1238,8 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
if (atom->u.remote_ref.option == RR_REF)
*s = show_ref(>u.remote_ref.refname, refname);
else if (atom->u.remote_ref.option == RR_TRACK) {
-   if (stat_tracking_info(branch, _ours,
-  _theirs, NULL)) {
+   if (stat_tracking_info(branch, _ours, _theirs,
+  NULL, AHEAD_BEHIND_FULL) < 0) {
*s = xstrdup(msgs.gone);
} else if (!num_ours && !num_theirs)
*s = "";
@@ -1256,8 +1256,8 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
free((void *)to_free);
}
} else if (atom->u.remote_ref.option == RR_TRACKSHORT) {
-   if (stat_tracking_info(branch, _ours,
-  _theirs, NULL))
+   if (stat_tracking_info(branch, _ours, _theirs,
+  NULL, AHEAD_BEHIND_FULL) < 0)
return;
 
if (!num_ours && !num_theirs)
diff --git a/remote.c b/remote.c
index b220f0d..ca5a416 100644
--- a/remote.c
+++ b/remote.c
@@ -1977,16 +1977,23 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
 }
 
 /*
- * Compare a branch with its upstream, and save their differences (number
- * of commits) in *num_ours and *num_theirs. The name of the upstream branch
- * (or NULL if no upstream is defined) is returned via *upstream_name, if it
- * is not itself NULL.
+ * Lookup the upstream branch for the given branch and if present, optionally
+ * compute the commit ahead/behind values for the pair.
+ *
+ * If abf is AHEAD_BEHIND_FULL, compute the full ahead/behind and return the
+ * counts in *num_ours and *num_theirs.  If abf is AHEAD_BEHIND_QUICK, skip
+ * the (potentially expensive) a/b computation (*num_ours and *num_theirs are
+ * left undefined).
+ *
+ * The name of the upstream branch (or NULL if no upstream is defined) is
+ * returned via *upstream_name, if it is not itself NULL.
  *
  * Returns -1 if num_ours and num_theirs could not be filled in (e.g., no
- * upstream defined, or ref does not exist), 0 otherwise.
+ * upstream defined, or ref does not exist).  Returns 0 if the commits are
+ * identical.  Returns 1 if commits are different.
  */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
-  const char **upstream_name)
+  const char **upstream_name, enum ahead_behind_flags abf)
 {
struct object_id oid;
struct commit *ours, *theirs;
@@ -2019,6 +2026,8 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
*num_theirs = *num_ours = 0;
return 0;
}
+   if (abf == AHEAD_BEHIND_QUICK)
+   return 1;
 
/* Run "rev-list --left-right ours...theirs" internally... */
argv_array_push(, ""); /* ignored */
@@ -2051,7 +2060,7 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
clear_commit_marks(theirs, ALL_REV_FLAGS);
 
argv_array_clear();
-   return 0;
+   return 1;
 }
 
 /*
@@ -2064,7 +2073,8 @@ int format_tracking_info(struct branch *branch, struct 
strbuf *sb)
char *base;
int upstream_is_gone = 0;
 
-   if (stat_tracking_info(branch, , , _base) < 0) {
+   if (stat_tracking_info(branch, , , _base,
+  AHEAD_BEHIND_FULL) < 0) {
if (!full_base)
return 0;
upstream_is_gone = 1;
diff --git a/remote.h b/remote.h
index 2ecf4c8..00932f5 100644
--- a/remote.h
+++ b/remote.h
@@ -255,9 +255,15 @@ enum match_refs_flags {
MATCH_REFS_FOLLOW_TAGS  = (1 << 3)
 };
 
+/* Flags for --ahead-behind option. */
+enum ahead_behind_flags {
+   AHEAD_BEHIND_QUICK = 0,  /* just eq/neq reporting */
+   AHEAD_BEHIND_FULL  = 1,  /* traditional a/b reporting */
+};
+
 /* Reporting of tracking info */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
-  

Hay

2018-01-03 Thread Financial Services
Loan Offer at 3% Lowest Rate Get Now.


Re: [PATCH 06/26] transport: use get_refs_via_connect to get refs

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
>
> Signed-off-by: Brandon Williams 

Reviewed-by: Stefan Beller 

> ---
>  transport.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/transport.c b/transport.c
> index fc802260f..8e8779096 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
> args.cloning = transport->cloning;
> args.update_shallow = data->options.update_shallow;
>
> -   if (!data->got_remote_heads) {
> -   connect_setup(transport, 0);
> -   get_remote_heads(data->fd[0], NULL, 0, _tmp, 0,
> -NULL, >shallow);
> -   data->got_remote_heads = 1;
> -   }
> +   if (!data->got_remote_heads)
> +   refs_tmp = get_refs_via_connect(transport, 0);
>
> refs = fetch_pack(, data->fd, data->conn,
>   refs_tmp ? refs_tmp : transport->remote_refs,
> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
> *transport, struct ref *remote_re
> struct send_pack_args args;
> int ret;
>
> -   if (!data->got_remote_heads) {
> -   struct ref *tmp_refs;
> -   connect_setup(transport, 1);
> -
> -   get_remote_heads(data->fd[0], NULL, 0, _refs, REF_NORMAL,
> -NULL, >shallow);
> -   data->got_remote_heads = 1;
> -   }
> +   if (!data->got_remote_heads)
> +   get_refs_via_connect(transport, 1);
>
> memset(, 0, sizeof(args));
> args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
> --
> 2.15.1.620.gb9897f4670-goog
>


Re: [PATCHv3 0/5] Fix --recurse-submodules for submodule worktree changes

2018-01-03 Thread Stefan Beller
On Wed, Jan 3, 2018 at 12:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Thanks Junio for review of this series!
>> The only change in this version of the series is
>>
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -2140,7 +2140,7 @@ int oneway_merge(const struct cache_entry * const *src,
>> update |= CE_UPDATE;
>> }
>> if (S_ISGITLINK(old->ce_mode) && should_update_submodules() 
>> &&
>> -   !verify_uptodate(old, o))
>> +   o->update && !verify_uptodate(old, o))
>> update |= CE_UPDATE;
>> add_entry(o, old, update, 0);
>>
>
> Sounds OK.
>
> I wonder why o->update is not at the very beginning of the &&-chain,
> though.  After all, the one above this addition begins with o->reset
> && o->update *not* because of the performance concern, but primarily
> due to logic flow.  I.e. "if we are resetting and updating the
> working tree, then..." comes first before saying "we may need to
> flip CE_UPDATE bit in update variable if the file in the working
> tree is not up to date and it is within a narrow checkout area".

It shows that I work too much with submodules. ;)
"If we have a submodule and ..." seemed to be the important
part when writing the patch.

> Of course, because verify_uptodate() is rather expensive, checking
> o->update before that makes sense from micro-optimization's point of
> view, too.

I would think S_ISGITLINK, should_update_submodules as well
as o->update are all on the same order of magnitude of costs
(some couple number of operations)  when
compared to verify_uptodate (spawning processes),
so as long as verify_uptodate goes last we'd be fine.

>
> So after thinking aloud like the above, I am reasonably sure that
> you want to check o->update as the very first thing in this new if
> statement.

Thanks for double checking and thinking about the code base with
a less submodule centric point of view.

Mind to squash it locally or want me to resend?
For a resend I'll wait a couple of days to see if there are more
comments needing to be addressed.


>
>> v2:
>> I dropped the patch to `same()` as I realized we only need to fix the
>> oneway_merge function, the others (two, three way merge) are fine as
>> they have the checks already in place.
>
> This is a bit flawed argument, no?  Checking working tree paths
> unconditionally in same(), which does not even know if we are
> touching the working tree paths, is broken.  Unless "they have the
> checks already in place" refers to checks that bypasses calls to
> same() when we are not touching working tree paths, that is, but
> obviously that is not what is going on.
>
> Will queue.  Thanks for working on this.
>
>


Re: [ANNOUNCE] Git v2.16.0-rc0

2018-01-03 Thread Junio C Hamano
Jonathan Nieder  writes:

> It's good you caught this flaw in the detection.  Would something like
> the following make sense?  If so, I can resend with a commit message
> and tests tomorrow or the day after.

So the idea is to keep the 'simple' for implementations that do not
support OpenSSH options, but declare that the auto-detection was
overly conservative and assume that -4/6/p are supported by
everybody?

This change means that those who were meant to be helped by the
original change, that introduced 'simple' and made the (overly
conservative) auto-detection, would now have to explicitly set
ssh.variant to simple, because otherwise they will be passed one of
these three options.  Am I reading the intention of the change
correctly?  If so, I tend to agree that it is lesser of the two
evils to make sure things continue to work for older openssh users
with their current setting like this patch does, even with the cost
of telling users with implementations that do not honor -4/6/p to
set things up manually, I guess.

Thanks, both, for digging this issue through.

>
> diff --git i/Documentation/config.txt w/Documentation/config.txt
> index 64c1dbba94..75eafd8db6 100644
> --- i/Documentation/config.txt
> +++ w/Documentation/config.txt
> @@ -2118,8 +2118,8 @@ ssh.variant::
>   unrecognized, Git will attempt to detect support of OpenSSH
>   options by first invoking the configured SSH command with the
>   `-G` (print configuration) option and will subsequently use
> - OpenSSH options (if that is successful) or no options besides
> - the host and remote command (if it fails).
> + OpenSSH options if that is successful or a conservative set of
> + OpenSSH-style options if it fails.
>  +
>  The config variable `ssh.variant` can be set to override this detection.
>  Valid values are `ssh` (to use OpenSSH options), `plink`, `putty`,
> diff --git i/connect.c w/connect.c
> index c3a014c5ba..3784c2be53 100644
> --- i/connect.c
> +++ w/connect.c
> @@ -941,10 +941,9 @@ static void push_ssh_options(struct argv_array *args, 
> struct argv_array *env,
>  
>   if (flags & CONNECT_IPV4) {
>   switch (variant) {
> - case VARIANT_AUTO:
> - BUG("VARIANT_AUTO passed to push_ssh_options");
>   case VARIANT_SIMPLE:
>   die("ssh variant 'simple' does not support -4");
> + case VARIANT_AUTO:
>   case VARIANT_SSH:
>   case VARIANT_PLINK:
>   case VARIANT_PUTTY:
> @@ -953,10 +952,9 @@ static void push_ssh_options(struct argv_array *args, 
> struct argv_array *env,
>   }
>   } else if (flags & CONNECT_IPV6) {
>   switch (variant) {
> - case VARIANT_AUTO:
> - BUG("VARIANT_AUTO passed to push_ssh_options");
>   case VARIANT_SIMPLE:
>   die("ssh variant 'simple' does not support -6");
> + case VARIANT_AUTO:
>   case VARIANT_SSH:
>   case VARIANT_PLINK:
>   case VARIANT_PUTTY:
> @@ -970,10 +968,9 @@ static void push_ssh_options(struct argv_array *args, 
> struct argv_array *env,
>  
>   if (port) {
>   switch (variant) {
> - case VARIANT_AUTO:
> - BUG("VARIANT_AUTO passed to push_ssh_options");
>   case VARIANT_SIMPLE:
>   die("ssh variant 'simple' does not support setting 
> port");
> + case VARIANT_AUTO:
>   case VARIANT_SSH:
>   argv_array_push(args, "-p");
>   break;
> @@ -1026,7 +1023,7 @@ static void fill_ssh_args(struct child_process *conn, 
> const char *ssh_host,
>VARIANT_SSH, port, flags);
>   argv_array_push(, ssh_host);
>  
> - variant = run_command() ? VARIANT_SIMPLE : VARIANT_SSH;
> + variant = run_command() ? VARIANT_AUTO : VARIANT_SSH;
>   }
>  
>   argv_array_push(>args, ssh);
> diff --git i/t/t5601-clone.sh w/t/t5601-clone.sh
> index 0f895478f0..0224edc85b 100755
> --- i/t/t5601-clone.sh
> +++ w/t/t5601-clone.sh
> @@ -365,6 +365,11 @@ test_expect_success 'OpenSSH variant passes -4' '
>   expect_ssh "-4 -p 123" myhost src
>  '
>  
> +test_expect_success 'OpenSSH passes GIT_PROTOCOL envvar' '
> + git -c protocol.version=1 clone [myhost:123]:src ssh-v1-clone &&
> + expect_ssh "-o SendEnv=GIT_PROTOCOL -p 123" myhost src
> +'
> +
>  test_expect_success 'variant can be overridden' '
>   copy_ssh_wrapper_as "$TRASH_DIRECTORY/putty" &&
>   git -c ssh.variant=putty clone -4 "[myhost:123]:src" ssh-putty-clone &&
> @@ -377,19 +382,32 @@ test_expect_success 'variant=auto picks based on 
> basename' '
>   expect_ssh "-4 -P 123" myhost src
>  '
>  
> -test_expect_success 'simple does not support -4/-6' '
> +test_expect_success 'variant=simple does not support -4/-6' '
>   

Re: [PATCH v5 00/34] Add directory rename detection to git

2018-01-03 Thread Elijah Newren
On Wed, Jan 3, 2018 at 2:57 AM, Johannes Sixt  wrote:

> I tested the series on Windows recently. It requires the patch below.
> I don't know whether this is indicating some portability issues of grep
> (^ being used in the middle of a RE instead of at the very beginning) or
> just a quirk in my setup.

Thanks for testing it out.  What version of Windows were you running
on?  With cygwin or without?  I tested previously on cygwin (I think
on Windows Server 2012??) and got all the tests passing there,
eventually[1].  I'm not sure I can find access to any other Windows
systems, but I'd be happy to take a look if I can.

[1] 
https://public-inbox.org/git/cabpp-bej6-mry0ocz1wwetrtg_iehkzodcuon_puukvywau...@mail.gmail.com/

The need to backslash escape a caret for a literal match when it
appears in the middle of the string makes sense.  Thanks for sending
along the patch.  Would you prefer I squashed it into the series
(still sitting in 'pu'), or keep your patch separate?  I'm fine with
either, I'm just unsure the protocol here.

> But it still does not pass the test suite because the system does not
> like file names such as y/c~HEAD:
>
> ++ grep 'Refusing to lose dirty file at z/c' out
> Refusing to lose dirty file at z/c
> ++ grep -q stuff x/b y/a y/c y/c~HEAD z/c
> grep: y/c: Invalid request code
> error: last command exited with $?=2
> not ok 94 - 11d-check: Avoid losing not-uptodate with rename + D/F conflict

This is exceptionally odd.  The actual line from the testsuite was
  grep -q stuff */*

which suggests your shell is both doing the pathname expansion and
treating the resulting filename not as a string but as something to be
interpreted that happens to have some kind of special
characters/commands, and then choking on the result.  Super weird.  I
could probably work around this by just running
  grep -q stuff z/c

I think I had the asterisks in there because I was thinking in terms
of directory rename detection potentially moving the file, but that's
probably just overkill.  Does the test pass for you with that change?
(If so, there are also two similar tests that I'd need to make similar
changes to.)

However, although that might fix this particular case, it suggests
some fragility of the tests and filenames for whatever system you
happen to be using.  merge-recursive.c's unique_path has created
filenames with tilde's in them for many years, it may just be that I'm
the first to use the resulting file in combination with grep to ensure
the contents are as we expect.  There may be other issues lurking
(even if not yet appearing in the testsuite) for your system when
dealing with merge conflicts.


Thanks,
Elijah









>
>  8< 
> From: Johannes Sixt 
> Date: Fri, 22 Dec 2017 09:33:13 +0100
> Subject: [PATCH] fixup directory rename tests
>
> Signed-off-by: Johannes Sixt 
> ---
>  t/t6043-merge-rename-directories.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/t/t6043-merge-rename-directories.sh 
> b/t/t6043-merge-rename-directories.sh
> index f0af66b8a9..b8cd428341 100755
> --- a/t/t6043-merge-rename-directories.sh
> +++ b/t/t6043-merge-rename-directories.sh
> @@ -2940,8 +2940,8 @@ test_expect_success '10b-check: Overwrite untracked 
> with dir rename + delete' '
> echo contents >y/e &&
>
> test_must_fail git merge -s recursive B^0 >out 2>err &&
> -   test_i18ngrep "CONFLICT (rename/delete).*Version B^0 of y/d 
> left in tree at y/d~B^0" out &&
> -   test_i18ngrep "Error: Refusing to lose untracked file at y/e; 
> writing to y/e~B^0 instead" out &&
> +   test_i18ngrep "CONFLICT (rename/delete).*Version B\^0 of y/d 
> left in tree at y/d~B\^0" out &&
> +   test_i18ngrep "Error: Refusing to lose untracked file at y/e; 
> writing to y/e~B\^0 instead" out &&
>
> test 3 -eq $(git ls-files -s | wc -l) &&
> test 2 -eq $(git ls-files -u | wc -l) &&
> @@ -3010,7 +3010,7 @@ test_expect_success '10c-check: Overwrite untracked 
> with dir rename/rename(1to2)
>
> test_must_fail git merge -s recursive B^0 >out 2>err &&
> test_i18ngrep "CONFLICT (rename/rename)" out &&
> -   test_i18ngrep "Refusing to lose untracked file at y/c; adding 
> as y/c~B^0 instead" out &&
> +   test_i18ngrep "Refusing to lose untracked file at y/c; adding 
> as y/c~B\^0 instead" out &&
>
> test 6 -eq $(git ls-files -s | wc -l) &&
> test 3 -eq $(git ls-files -u | wc -l) &&
> --
> 2.14.2.808.g3bc32f2729


Bug report: git clone with dest

2018-01-03 Thread Isaac Shabtay
Hello,
Target directory is deleted on clone failures.

Steps to reproduce, for example on Windows:

cd /d %TEMP%
mkdir dest
git clone https://some-fake-url/whatever-makes-git-clone-fail dest

Of course, the clone will fail as it should. But looks like the Git
client also ends up deleting the "dest" directory.

This shouldn't happen.


[PATCH v2 4/5] update-index doc: note a fixed bug in the untracked cache

2018-01-03 Thread Ævar Arnfjörð Bjarmason
Document the bug tested for in my "status: add a failing test showing
a core.untrackedCache bug" and fixed in Duy's "dir.c: fix missing dir
invalidation in untracked code".

Since this is very likely something others will encounter in the
future on older versions, and it's not obvious how to fix it let's
document both that it exists, and how to "fix" it with a one-off
command.

As noted in that commit, even though this bug gets the untracked cache
into a bad state, we have not yet found a case where this is user
visible, and thus it makes sense for these docs to focus on the
symlink case only.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-update-index.txt | 16 
 1 file changed, 16 insertions(+)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index bdb0342593..128e0c671f 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -464,6 +464,22 @@ command reads the index; while when 
`--[no-|force-]untracked-cache`
 are used, the untracked cache is immediately added to or removed from
 the index.
 
+Before 2.16, the untracked cache had a bug where replacing a directory
+with a symlink to another directory could cause it to incorrectly show
+files tracked by git as untracked. See the "status: add a failing test
+showing a core.untrackedCache bug" commit to git.git. A workaround for
+that was (and this might work for other undiscoverd bugs in the
+future):
+
+
+$ git -c core.untrackedCache=false status
+
+
+This bug has also been shown to affect non-symlink cases of replacing
+a directory with a file when it comes to the internal structures of
+the untracked cache, but no case has been found where this resulted in
+wrong "git status" output.
+
 File System Monitor
 ---
 
-- 
2.15.1.424.g9478a66081



Re: [PATCHv3 0/5] Fix --recurse-submodules for submodule worktree changes

2018-01-03 Thread Junio C Hamano
Stefan Beller  writes:

> Thanks Junio for review of this series!
> The only change in this version of the series is
>
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -2140,7 +2140,7 @@ int oneway_merge(const struct cache_entry * const *src,
> update |= CE_UPDATE;
> }
> if (S_ISGITLINK(old->ce_mode) && should_update_submodules() &&
> -   !verify_uptodate(old, o))
> +   o->update && !verify_uptodate(old, o))
> update |= CE_UPDATE;
> add_entry(o, old, update, 0);
>

Sounds OK.  

I wonder why o->update is not at the very beginning of the &&-chain,
though.  After all, the one above this addition begins with o->reset
&& o->update *not* because of the performance concern, but primarily
due to logic flow.  I.e. "if we are resetting and updating the
working tree, then..." comes first before saying "we may need to
flip CE_UPDATE bit in update variable if the file in the working
tree is not up to date and it is within a narrow checkout area".

Of course, because verify_uptodate() is rather expensive, checking
o->update before that makes sense from micro-optimization's point of
view, too.

So after thinking aloud like the above, I am reasonably sure that
you want to check o->update as the very first thing in this new if
statement.

> v2:
> I dropped the patch to `same()` as I realized we only need to fix the
> oneway_merge function, the others (two, three way merge) are fine as
> they have the checks already in place.

This is a bit flawed argument, no?  Checking working tree paths
unconditionally in same(), which does not even know if we are
touching the working tree paths, is broken.  Unless "they have the
checks already in place" refers to checks that bypasses calls to
same() when we are not touching working tree paths, that is, but
obviously that is not what is going on.

Will queue.  Thanks for working on this.




[PATCH v2 2/5] dir.c: avoid stat() in valid_cached_dir()

2018-01-03 Thread Ævar Arnfjörð Bjarmason
From: Nguyễn Thái Ngọc Duy 

stat() may follow a symlink and return stat data of the link's target
instead of the link itself. We are concerned about the link itself.

It's kind of hard to demonstrate the bug. I think when path->buf is a
symlink, we most likely find that its target's stat data does not
match our cached one, which means we ignore the cache and fall back to
slow path.

This is performance issue, not correctness (though we could still
catch it by verifying test-dump-untracked-cache. The less unlikely
case is, link target stat data matches the cached version and we
incorrectly go fast path, ignoring real data on disk. A test for this
may involve manipulating stat data, which may be not portable.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..edcb7bb462 100644
--- a/dir.c
+++ b/dir.c
@@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 */
refresh_fsmonitor(istate);
if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
-   if (stat(path->len ? path->buf : ".", )) {
+   if (lstat(path->len ? path->buf : ".", )) {
invalidate_directory(dir->untracked, untracked);
memset(>stat_data, 0, 
sizeof(untracked->stat_data));
return 0;
-- 
2.15.1.424.g9478a66081



[PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir()

2018-01-03 Thread Ævar Arnfjörð Bjarmason
From: Nguyễn Thái Ngọc Duy 

A follow-up to the recently fixed bugs in the untracked
invalidation. If opendir() fails it should show a warning, perhaps
this should die, but if this ever happens the error is probably
recoverable for the user, and dying would just make things worse.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 dir.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 163ca69df0..a605e01692 100644
--- a/dir.c
+++ b/dir.c
@@ -1857,17 +1857,22 @@ static int open_cached_dir(struct cached_dir *cdir,
   struct strbuf *path,
   int check_only)
 {
+   const char *c_path;
+
memset(cdir, 0, sizeof(*cdir));
cdir->untracked = untracked;
if (valid_cached_dir(dir, untracked, istate, path, check_only))
return 0;
-   cdir->fdir = opendir(path->len ? path->buf : ".");
+   c_path = path->len ? path->buf : ".";
+   cdir->fdir = opendir(c_path);
if (dir->untracked) {
invalidate_directory(dir->untracked, untracked);
dir->untracked->dir_opened++;
}
-   if (!cdir->fdir)
+   if (!cdir->fdir) {
+   warning_errno(_("could not open directory '%s'"), c_path);
return -1;
+   }
return 0;
 }
 
-- 
2.15.1.424.g9478a66081



[PATCH v2 1/5] status: add a failing test showing a core.untrackedCache bug

2018-01-03 Thread Ævar Arnfjörð Bjarmason
The untracked cache gets confused when a directory is swapped out for
a file. It is easiest to reproduce this by swapping out a directory
with a symlink to another directory, and as the tests show the symlink
case is the only case we've found where "git status" will subsequently
report incorrect information, even though it's possible to otherwise
get the untracked cache into a state where its internal data
structures don't reflect reality.

In the symlink case, whatever files are inside the target of the
symlink will be incorrectly shown as untracked. This issue does not
happen if the symlink links to another file, only if it links to
another directory.

A stand-alone testcase for copying into a terminal:

(
rm -rf /tmp/testrepo &&
git init /tmp/testrepo &&
cd /tmp/testrepo &&
mkdir x y &&
touch x/a y/b &&
git add x/a y/b &&
git commit -msnap &&
git rm -rf y &&
ln -s x y &&
git add y &&
git commit -msnap2 &&
git checkout HEAD~ &&
git status &&
git checkout master &&
sleep 1 &&
git status &&
git status
)

This will incorrectly show y/a as an untracked file. Both the "git
status" call right before "git checkout master" and the "sleep 1"
after the "checkout master" are needed to reproduce this, presumably
due to the untracked cache tracking on the basis of cached whole
seconds from stat(2).

When git gets into this state, a workaround to fix it is to issue a
one-off:

git -c core.untrackedCache=false status

For the non-symlink case, the bug is that the output of
test-dump-untracked-cache should not include:

   /one/  recurse valid

It being in the output implies that cached traversal of root includes
the directory "one" which does not exist on disk anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7063-status-untracked-cache.sh | 87 +++
 1 file changed, 87 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh 
b/t/t7063-status-untracked-cache.sh
index e5fb892f95..dba7f50bbb 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -22,6 +22,12 @@ avoid_racy() {
sleep 1
 }
 
+status_is_clean() {
+   >../status.expect &&
+   git status --porcelain >../status.actual &&
+   test_cmp ../status.expect ../status.actual
+}
+
 test_lazy_prereq UNTRACKED_CACHE '
{ git update-index --test-untracked-cache; ret=$?; } &&
test $ret -ne 1
@@ -683,4 +689,85 @@ test_expect_success 'untracked cache survives a commit' '
test_cmp ../before ../after
 '
 
+test_expect_success 'teardown worktree' '
+   cd ..
+'
+
+test_expect_success SYMLINKS 'setup worktree for symlink test' '
+   git init worktree-symlink &&
+   cd worktree-symlink &&
+   git config core.untrackedCache true &&
+   mkdir one two &&
+   touch one/file two/file &&
+   git add one/file two/file &&
+   git commit -m"first commit" &&
+   git rm -rf one &&
+   ln -s two one &&
+   git add one &&
+   git commit -m"second commit"
+'
+
+test_expect_failure SYMLINKS '"status" after symlink replacement should be 
clean with UC=true' '
+   git checkout HEAD~ &&
+   status_is_clean &&
+   status_is_clean &&
+   git checkout master &&
+   avoid_racy &&
+   status_is_clean &&
+   status_is_clean
+'
+
+test_expect_success SYMLINKS '"status" after symlink replacement should be 
clean with UC=false' '
+   git config core.untrackedCache false &&
+   git checkout HEAD~ &&
+   status_is_clean &&
+   status_is_clean &&
+   git checkout master &&
+   avoid_racy &&
+   status_is_clean &&
+   status_is_clean
+'
+
+test_expect_success 'setup worktree for non-symlink test' '
+   git init worktree-non-symlink &&
+   cd worktree-non-symlink &&
+   git config core.untrackedCache true &&
+   mkdir one two &&
+   touch one/file two/file &&
+   git add one/file two/file &&
+   git commit -m"first commit" &&
+   git rm -rf one &&
+   cp two/file one &&
+   git add one &&
+   git commit -m"second commit"
+'
+
+test_expect_failure '"status" after file replacement should be clean with 
UC=true' '
+   git checkout HEAD~ &&
+   status_is_clean &&
+   status_is_clean &&
+   git checkout master &&
+   avoid_racy &&
+   status_is_clean &&
+   test-dump-untracked-cache >../actual &&
+   grep -F "recurse valid" ../actual >../actual.grep &&
+   cat >../expect.grep <

[PATCH v2 3/5] dir.c: fix missing dir invalidation in untracked code

2018-01-03 Thread Ævar Arnfjörð Bjarmason
From: Nguyễn Thái Ngọc Duy 

Let's start with how create a new directory cache after the last one
becomes invalid (e.g. because its dir mtime has changed...). In
open_cached_dir():

1. We start out with valid_cached_dir() returning false, which should
   call invalidate_directory() to put a directory state back to
   initial state, no untracked entries (untracked_nr zero), no sub
   directory traversal (dirs[].recurse zero).

2. Since the cache cannot be used, we go the slow path opendir() and
   go through items one by one via readdir(). All the directories on
   disk will be added back to the cache (if not already exist in
   dirs[]) and its flag "recurse" gets changed to one to note that
   it's part of the cached dir travesal next time.

3. By the time we reach close_cached_dir() we should have a good
   subdir list in dirs[]. Those with "recurse" flag set are the ones
   present in the on-disk directory. The directory is now marked
   "valid".

Next time read_directory() is called, since the directory is marked
valid, it will skip readdir(), go fast path and traverse through
dirs[] array instead.

Steps one and two need some tight cooperation. If a subdir is removed,
readdir() will not find it and of course we cannot examine/invalidate
it. To make sure removed directories on disk are gone from the cache,
step one must make sure recurse flag of all subdirs are zero.

But that's not true. If "valid" flag is already false, there is a
chance we go straight to the end of valid_cached_dir() without calling
invalidate_directory(). Or we fail to meet the "if (untracked-valid)"
condition and skip over the invalidate_directory().

After step 3, we mark the cache valid. Any stale subdir with incorrect
recurse flag becomes a real subdir next time we traverse the directory
using dirs[] array.

We could avoid this by making sure invalidate_directory() is always
called (therefore dirs[].recurse cleared) at the beginning of
open_cached_dir(). Which is what this patch does.

As to how we get into this situation, the key in the test is this
command

git checkout master

where "one/file" is replaced with "one" in the index. This index
update triggers untracked_cache_invalidate_path(), which clears valid
flag of the root directory while keeping "recurse" flag on the subdir
"one" on. On the next git-status, we go through steps 1-3 above and
save an incorrect cache on disk. The second git-status blindly follows
the bad cache data and shows the problem.

This is arguably because of a bad design where "recurse" flag plays
double roles: whether a directory should be saved on disk, and whether
it is part of a directory traversal.

We need to keep recurse flag set at "checkout master" because of the
first role: we need to keep subdir caches (dir "two" for example has
not been touched at all, no reason to throw its cache away).

As long as we make sure to ignore/reset "recurse" flag at the
beginning of a directory traversal, we're good. But maybe eventually
we should separate these two roles.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 dir.c | 22 ++
 t/t7063-status-untracked-cache.sh |  4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/dir.c b/dir.c
index edcb7bb462..163ca69df0 100644
--- a/dir.c
+++ b/dir.c
@@ -774,7 +774,16 @@ static void invalidate_directory(struct untracked_cache 
*uc,
 struct untracked_cache_dir *dir)
 {
int i;
-   uc->dir_invalidated++;
+
+   /*
+* Invalidation increment here is just roughly correct. If
+* untracked_nr or any of dirs[].recurse is non-zero, we
+* should increment dir_invalidated too. But that's more
+* expensive to do.
+*/
+   if (dir->valid)
+   uc->dir_invalidated++;
+
dir->valid = 0;
dir->untracked_nr = 0;
for (i = 0; i < dir->dirs_nr; i++)
@@ -1810,23 +1819,18 @@ static int valid_cached_dir(struct dir_struct *dir,
refresh_fsmonitor(istate);
if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
if (lstat(path->len ? path->buf : ".", )) {
-   invalidate_directory(dir->untracked, untracked);
memset(>stat_data, 0, 
sizeof(untracked->stat_data));
return 0;
}
if (!untracked->valid ||
match_stat_data_racy(istate, >stat_data, 
)) {
-   if (untracked->valid)
-   invalidate_directory(dir->untracked, untracked);
fill_stat_data(>stat_data, );
return 0;
}
}
 
-   if (untracked->check_only != !!check_only) {
-   invalidate_directory(dir->untracked, untracked);
+   if (untracked->check_only != !!check_only)
 

[PATCH v2 0/5] untracked cache bug fixes

2018-01-03 Thread Ævar Arnfjörð Bjarmason
Took me a while to get around to this. This is a replacement for the
patches Duy and I have floating around on the list so far related to
the untracked cache bugs raised recently.

Part of this has been me incorporating Duy's work and writing commit
messages etc. for him.

Nguyễn Thái Ngọc Duy (3):
  dir.c: avoid stat() in valid_cached_dir()
  dir.c: fix missing dir invalidation in untracked code
  dir.c: stop ignoring opendir() error in open_cached_dir()

Ævar Arnfjörð Bjarmason (2):
  status: add a failing test showing a core.untrackedCache bug
  update-index doc: note a fixed bug in the untracked cache

 Documentation/git-update-index.txt | 16 +++
 dir.c  | 33 ++-
 t/t7063-status-untracked-cache.sh  | 87 ++
 3 files changed, 125 insertions(+), 11 deletions(-)

-- 
2.15.1.424.g9478a66081



Re: [PATCH v1] convert: add support for 'encoding' attribute

2018-01-03 Thread Lars Schneider

On 03 Jan 2018, at 20:15, Junio C Hamano  wrote:

> Torsten Bögershausen  writes:
> 
>> May be.
>> Originally utf8.c was about encoding and all kind of UTF-8 related stuff.
>> Especially it didn't know anything about strbuf.
>> I don't know why strbuf.h and other functions had been added here,
>> 
>> I once moved them into strbuf.c without any problems, but never send out
>> a patch, because of possible merge conflicts in ongoing patches.
>> 
>> In any case, if it is about strbuf, I would try to put it into strbuf.c
> 
> Please don't.
> 
> A code that happens to use strbuf as a container and about
> manipulating the contents is quite different from a code about
> strbuf.  The latter is to enhance and extend how the strbuf as a
> container behaves.  An operation about character encoding for a
> string that happens to be stored in strbuf is more about the
> encoding, and much much less about strbuf.
> 
> convert.c is about massaging contents coming from the outside world
> into a shape stored in Git and the other way around, and there are
> multiple ways the contents are massaged.  EOL convention may be
> adjusted, characters may be reencoded, end-user defined conversion
> may be applied.  Some of these operations may use helpers specific
> for the task from other more library-ish files, like checking if a
> string looks like encoded in UTF-8 from utf8.[ch].

Agreed. I did that in v2. See these patches:

https://public-inbox.org/git/2017122915.39680-3-lars.schnei...@autodesk.com/
https://public-inbox.org/git/2017122915.39680-4-lars.schnei...@autodesk.com/

- Lars

Re: [PATCH 04/26] upload-pack: convert to a builtin

2018-01-03 Thread Brandon Williams
On 01/03, Stefan Beller wrote:
> On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> 
> What is the security aspect of this patch?
> 
> By making upload-pack builtin, it gains additional abilities,
> such as answers to '-h' or '--help' (which would start a pager).
> Is there an easy way to sooth my concerns? (best put into the
> commit message)

receive-pack is already a builtin, so theres that.

> 
> Thanks,
> Stefan
> 

-- 
Brandon Williams


Re: [PATCH 05/26] upload-pack: factor out processing lines

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> Factor out the logic for processing shallow, deepen, deepen_since, and
> deepen_not lines into their own functions to simplify the
> 'receive_needs()' function in addition to making it easier to reuse some
> of this logic when implementing protocol_v2.
>
> Signed-off-by: Brandon Williams 
> ---
>  upload-pack.c | 113 
> ++
>  1 file changed, 74 insertions(+), 39 deletions(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 20acaa49d..9a507ae53 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -731,6 +731,75 @@ static void deepen_by_rev_list(int ac, const char **av,
> packet_flush(1);
>  }
>
> +static int process_shallow(const char *line, struct object_array *shallows)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "shallow ", )) {

stylistic nit:

You could invert the condition in each of the process_* functions
to just have

if (!skip_prefix...))
return 0

/* less indented code goes here */

return 1;

That way we have less indentation as well as easier code.
(The reader doesn't need to keep in mind what the else
part is about; it is a rather local decision to bail out instead
of having the return at the end of the function.)


> +   struct object_id oid;
> +   struct object *object;
> +   if (get_oid_hex(arg, ))
> +   die("invalid shallow line: %s", line);
> +   object = parse_object();
> +   if (!object)
> +   return 1;
> +   if (object->type != OBJ_COMMIT)
> +   die("invalid shallow object %s", oid_to_hex());
> +   if (!(object->flags & CLIENT_SHALLOW)) {
> +   object->flags |= CLIENT_SHALLOW;
> +   add_object_array(object, NULL, shallows);
> +   }
> +   return 1;
> +   }
> +
> +   return 0;
> +}
> +
> +static int process_deepen(const char *line, int *depth)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "deepen ", )) {
> +   char *end = NULL;
> +   *depth = strtol(arg, , 0);
> +   if (!end || *end || depth <= 0)
> +   die("Invalid deepen: %s", line);
> +   return 1;
> +   }
> +
> +   return 0;
> +}
> +
> +static int process_deepen_since(const char *line, timestamp_t *deepen_since, 
> int *deepen_rev_list)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "deepen-since ", )) {
> +   char *end = NULL;
> +   *deepen_since = parse_timestamp(arg, , 0);
> +   if (!end || *end || !deepen_since ||
> +   /* revisions.c's max_age -1 is special */
> +   *deepen_since == -1)
> +   die("Invalid deepen-since: %s", line);
> +   *deepen_rev_list = 1;
> +   return 1;
> +   }
> +   return 0;
> +}
> +
> +static int process_deepen_not(const char *line, struct string_list 
> *deepen_not, int *deepen_rev_list)
> +{
> +   const char *arg;
> +   if (skip_prefix(line, "deepen-not ", )) {
> +   char *ref = NULL;
> +   struct object_id oid;
> +   if (expand_ref(arg, strlen(arg), , ) != 1)
> +   die("git upload-pack: ambiguous deepen-not: %s", 
> line);
> +   string_list_append(deepen_not, ref);
> +   free(ref);
> +   *deepen_rev_list = 1;
> +   return 1;
> +   }
> +   return 0;
> +}
> +
>  static void receive_needs(void)
>  {
> struct object_array shallows = OBJECT_ARRAY_INIT;
> @@ -752,49 +821,15 @@ static void receive_needs(void)
> if (!line)
> break;
>
> -   if (skip_prefix(line, "shallow ", )) {
> -   struct object_id oid;
> -   struct object *object;
> -   if (get_oid_hex(arg, ))
> -   die("invalid shallow line: %s", line);
> -   object = parse_object();
> -   if (!object)
> -   continue;
> -   if (object->type != OBJ_COMMIT)
> -   die("invalid shallow object %s", 
> oid_to_hex());
> -   if (!(object->flags & CLIENT_SHALLOW)) {
> -   object->flags |= CLIENT_SHALLOW;
> -   add_object_array(object, NULL, );
> -   }
> +   if (process_shallow(line, ))
> continue;
> -   }
> -   if (skip_prefix(line, "deepen ", )) {
> -   char *end = NULL;
> -   depth = strtol(arg, , 0);
> -  

Re: [PATCH 04/26] upload-pack: convert to a builtin

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> In order to allow for code sharing with the server-side of fetch in
> protocol-v2 convert upload-pack to be a builtin.

What is the security aspect of this patch?

By making upload-pack builtin, it gains additional abilities,
such as answers to '-h' or '--help' (which would start a pager).
Is there an easy way to sooth my concerns? (best put into the
commit message)

Thanks,
Stefan

>
> Signed-off-by: Brandon Williams 
> ---
>  Makefile  | 3 ++-
>  builtin.h | 1 +
>  git.c | 1 +
>  upload-pack.c | 2 +-
>  4 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 2a81ae22e..e0740b452 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -636,7 +636,6 @@ PROGRAM_OBJS += imap-send.o
>  PROGRAM_OBJS += sh-i18n--envsubst.o
>  PROGRAM_OBJS += shell.o
>  PROGRAM_OBJS += show-index.o
> -PROGRAM_OBJS += upload-pack.o
>  PROGRAM_OBJS += remote-testsvn.o
>
>  # Binary suffix, set to .exe for Windows builds
> @@ -701,6 +700,7 @@ BUILT_INS += git-merge-subtree$X
>  BUILT_INS += git-show$X
>  BUILT_INS += git-stage$X
>  BUILT_INS += git-status$X
> +BUILT_INS += git-upload-pack$X
>  BUILT_INS += git-whatchanged$X
>
>  # what 'all' will build and 'install' will install in gitexecdir,
> @@ -904,6 +904,7 @@ LIB_OBJS += tree-diff.o
>  LIB_OBJS += tree.o
>  LIB_OBJS += tree-walk.o
>  LIB_OBJS += unpack-trees.o
> +LIB_OBJS += upload-pack.o
>  LIB_OBJS += url.o
>  LIB_OBJS += urlmatch.o
>  LIB_OBJS += usage.o
> diff --git a/builtin.h b/builtin.h
> index 42378f3aa..f332a1257 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -231,6 +231,7 @@ extern int cmd_update_ref(int argc, const char **argv, 
> const char *prefix);
>  extern int cmd_update_server_info(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_upload_archive(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_upload_archive_writer(int argc, const char **argv, const char 
> *prefix);
> +extern int cmd_upload_pack(int argc, const char **argv, const char *prefix);
>  extern int cmd_var(int argc, const char **argv, const char *prefix);
>  extern int cmd_verify_commit(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
> diff --git a/git.c b/git.c
> index c870b9719..f71073dc8 100644
> --- a/git.c
> +++ b/git.c
> @@ -478,6 +478,7 @@ static struct cmd_struct commands[] = {
> { "update-server-info", cmd_update_server_info, RUN_SETUP },
> { "upload-archive", cmd_upload_archive },
> { "upload-archive--writer", cmd_upload_archive_writer },
> +   { "upload-pack", cmd_upload_pack },
> { "var", cmd_var, RUN_SETUP_GENTLY },
> { "verify-commit", cmd_verify_commit, RUN_SETUP },
> { "verify-pack", cmd_verify_pack },
> diff --git a/upload-pack.c b/upload-pack.c
> index d5de18127..20acaa49d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1032,7 +1032,7 @@ static int upload_pack_config(const char *var, const 
> char *value, void *unused)
> return parse_hide_refs_config(var, value, "uploadpack");
>  }
>
> -int cmd_main(int argc, const char **argv)
> +int cmd_upload_pack(int argc, const char **argv, const char *prefix)
>  {
> const char *dir;
> int strict = 0;
> --
> 2.15.1.620.gb9897f4670-goog
>


Re: [PATCH 01/26] pkt-line: introduce packet_read_with_status

2018-01-03 Thread Stefan Beller
On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:
> The current pkt-line API encodes the status of a pkt-line read in the
> length of the read content.  An error is indicated with '-1', a flush
> with '0' (which can be confusing since a return value of '0' can also
> indicate an empty pkt-line), and a positive integer for the length of
> the read content otherwise.  This doesn't leave much room for allowing
> the addition of additional special packets in the future.
>
> To solve this introduce 'packet_read_with_status()' which reads a packet
> and returns the status of the read encoded as an 'enum packet_status'
> type.  This allows for easily identifying between special and normal
> packets as well as errors.  It also enables easily adding a new special
> packet in the future.
>
> Signed-off-by: Brandon Williams 
> ---
>  pkt-line.c | 55 ++-
>  pkt-line.h | 15 +++
>  2 files changed, 57 insertions(+), 13 deletions(-)
>
> diff --git a/pkt-line.c b/pkt-line.c
> index 2827ca772..8d7cd389f 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -280,28 +280,33 @@ static int packet_length(const char *linelen)
> return (val < 0) ? val : (val << 8) | hex2chr(linelen + 2);
>  }
>
> -int packet_read(int fd, char **src_buf, size_t *src_len,
> -   char *buffer, unsigned size, int options)
> +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
> size_t *src_len,
> +   char *buffer, unsigned size, 
> int *pktlen,
> +   int options)
>  {
> -   int len, ret;
> +   int len;
> char linelen[4];
>
> -   ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
> -   if (ret < 0)
> -   return ret;
> +   if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0)
> +   return PACKET_READ_EOF;
> +
> len = packet_length(linelen);
> if (len < 0)
> die("protocol error: bad line length character: %.4s", 
> linelen);
> -   if (!len) {
> +
> +   if (len == 0) {
> packet_trace("", 4, 0);
> -   return 0;
> +   return PACKET_READ_FLUSH;
> +   } else if (len >= 1 && len <= 3) {
> +   die("protocol error: bad line length character: %.4s", 
> linelen);

I wonder how much libified code we want here already, maybe we could
have PACKET_READ_ERROR as a return value here instead of die()ing.
There could also be an option that tells this code to die on error, this reminds
me of the repository discovery as well as the refs code, both of which have
this pattern.

Currently this series is only upgrading commands that use the network
anyway, so I guess die()ing in an ls-remote or fetch is no big deal,
but it could
be interesting to keep going once we have more of the partial clone
stuff working
(e.g. remote assisted log/blame would want to gracefully fall back instead of
die()ing without any useful output, I would think.)

> }
> +
> len -= 4;
> -   if (len >= size)
> +   if ((len < 0) || ((unsigned)len >= size))
> die("protocol error: bad line length %d", len);
> -   ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
> -   if (ret < 0)
> -   return ret;
> +
> +   if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 
> 0)
> +   return PACKET_READ_EOF;
>
> if ((options & PACKET_READ_CHOMP_NEWLINE) &&
> len && buffer[len-1] == '\n')
> @@ -309,7 +314,31 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
>
> buffer[len] = 0;
> packet_trace(buffer, len, 0);
> -   return len;
> +   *pktlen = len;
> +   return PACKET_READ_NORMAL;
> +}
> +
> +int packet_read(int fd, char **src_buffer, size_t *src_len,
> +   char *buffer, unsigned size, int options)
> +{
> +   enum packet_read_status status;
> +   int pktlen;
> +
> +   status = packet_read_with_status(fd, src_buffer, src_len,
> +buffer, size, ,
> +options);
> +   switch (status) {
> +   case PACKET_READ_EOF:
> +   pktlen = -1;
> +   break;
> +   case PACKET_READ_NORMAL:
> +   break;
> +   case PACKET_READ_FLUSH:
> +   pktlen = 0;
> +   break;
> +   }
> +
> +   return pktlen;
>  }
>
>  static char *packet_read_line_generic(int fd,
> diff --git a/pkt-line.h b/pkt-line.h
> index 3dad583e2..06c468927 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -65,6 +65,21 @@ int write_packetized_from_buf(const char *src_in, size_t 
> len, int fd_out);
>  int packet_read(int fd, char **src_buffer, size_t *src_len, char
> *buffer, unsigned size, int options);
>
> 

Re: [PATCH] Add shell completion for git remote rm

2018-01-03 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Sat, Dec 30 2017, Todd Zullinger jotted:
>
>> And I think that should also apply to
>> not offering completion for commands/subcommands/options
>> which are only kept for backward compatibility.
>
> Yeah I think it makes sense to at some point stop completing things if
> we're going to remove stuff, if we decide to remove it.
>
>> Here's one way to make 'git remote rm ' work without
>> including it in the output of 'git remote ':
>>
>> diff --git i/contrib/completion/git-completion.bash 
>> w/contrib/completion/git-completion.bash
>> index 3683c772c5..aa63f028ab 100644
>> --- i/contrib/completion/git-completion.bash
>> +++ w/contrib/completion/git-completion.bash
>> @@ -2668,7 +2668,9 @@ _git_remote ()
>>  add rename remove set-head set-branches
>>  get-url set-url show prune update
>>  "
>> -local subcommand="$(__git_find_on_cmdline "$subcommands")"
>> +# Don't advertise rm by including it in subcommands, but complete
>> +# remotes if it is used.
>> +local subcommand="$(__git_find_on_cmdline "$subcommands rm")"
>>  if [ -z "$subcommand" ]; then
>>  case "$cur" in
>>  --*)
>
> Neat!

Yes, indeed it is nice.




Re: [PATCH v2 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-03 Thread Ævar Arnfjörð Bjarmason

On Wed, Jan 03 2018, Adam Dinwoodie jotted:

> On Wednesday 03 January 2018 at 02:31 pm +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Jan 03 2018, Adam Dinwoodie jotted:
>>
>> > On Monday 25 December 2017 at 12:28 am +, Ævar Arnfjörð Bjarmason 
>> > wrote:
>> >> There has never been any full roundtrip testing of what git-ls-files
>> >> and other functions that use wildmatch() actually do, rather we've
>> >> been satisfied with just testing the underlying C function.
>> >>
>> >> Due to git-ls-files and friends having their own codepaths before they
>> >> call wildmatch() there's sometimes differences in the behavior between
>> >> the two, and even when we test for those (as with
>> >> 9e4e8a64c2 ("pathspec: die on empty strings as pathspec", 2017-06-06))
>> >> there was no one place where you can review how these two modes
>> >> differ.
>> >>
>> >> Now there is. We now attempt to create a file called $haystack and
>> >> match $needle against it for each pair of $needle and $haystack that
>> >> we were passing to test-wildmatch.
>> >>
>> >> If we can't create the file we skip the test. This ensures that we can
>> >> run this on all platforms and not maintain some infinitely growing
>> >> whitelist of e.g. platforms that don't support certain characters in
>> >> filenames.
>> >>
>> >> As a result of doing this we can now see the cases where these two
>> >> ways of testing wildmatch differ:
>> >>
>> >>  * Creating a file called 'a[]b' and running ls-files 'a[]b' will show
>> >>that file, but wildmatch("a[]b", "a[]b") will not match
>> >>
>> >>  * wildmatch() won't match a file called \ against \, but ls-files
>> >>will.
>> >>
>> >>  * `git --glob-pathspecs ls-files 'foo**'` will match a file
>> >>'foo/bba/arr', but wildmatch won't, however pathmatch will.
>> >>
>> >>This seems like a bug to me, the two are otherwise equivalent as
>> >>these tests show.
>> >>
>> >> This also reveals the case discussed in 9e4e8a64c2 above, where '' is
>> >> now an error as far as ls-files is concerned, but wildmatch() itself
>> >> happily accepts it.
>> >>
>> >> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> >
>> > I'm seeing this test script failing on the pu branch as a result of this
>> > commit when building on Cygwin.  Specifically, the test fails at
>> > 9d45e1ca4 ("Merge branch 'bw/oidmap-autoinit' into pu", 2017-12-28), and
>> > bisecting points the blame at 2ee0c785a ("wildmatch test: create & test
>> > files on disk in addition to in-memory", 2017-12-25).
>> >
>> > I've copied the verbose error output for the first error below, and
>> > uploaded the full output, including verbose and trace output for the
>> > unexpectedly failing tests, at [0].  (With 42 failures among 1512 tests,
>> > there's a lot of it, so I didn't want to include it in an email.)
>>
>> Does the fixup above in <878tdm8k2d@evledraar.gmail.com> work for
>> you, i.e. changing $10 in the script to ${10}?
>
> This fixes some but not all of the failures: I'm now down from 42 to 24
> failures.
>
> Updated verbose test output is at
> https://gist.github.com/me-and/04443bcb00e12436f0eacce079b56d02

Thanks lot, looking through our own commit logs I believe the rest
should be fixed by this (prior art in 6fd1106aa4), it would be great if
you could test it, I don't have access to a Windows machine:

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index f985139b6f..5838fcb77d 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -23,6 +23,15 @@ create_test_file() {
*//*)
return 1
;;
+   # On Windows, \ in paths is silently converted to /, which
+   # would result in the "touch" below working, but the test
+   # itself failing.
+   *\\*)
+   if ! test_have_prereq BSLASHPSPEC
+   then
+   return 1
+   fi
+   ;;
# When testing the difference between foo/bar and foo/bar/ we
# can't test the latter.
*/)


Re: [PATCH v1] convert: add support for 'encoding' attribute

2018-01-03 Thread Junio C Hamano
Torsten Bögershausen  writes:

> May be.
> Originally utf8.c was about encoding and all kind of UTF-8 related stuff.
> Especially it didn't know anything about strbuf.
> I don't know why strbuf.h and other functions had been added here,
>
> I once moved them into strbuf.c without any problems, but never send out
> a patch, because of possible merge conflicts in ongoing patches.
>
> In any case, if it is about strbuf, I would try to put it into strbuf.c

Please don't.

A code that happens to use strbuf as a container and about
manipulating the contents is quite different from a code about
strbuf.  The latter is to enhance and extend how the strbuf as a
container behaves.  An operation about character encoding for a
string that happens to be stored in strbuf is more about the
encoding, and much much less about strbuf.

convert.c is about massaging contents coming from the outside world
into a shape stored in Git and the other way around, and there are
multiple ways the contents are massaged.  EOL convention may be
adjusted, characters may be reencoded, end-user defined conversion
may be applied.  Some of these operations may use helpers specific
for the task from other more library-ish files, like checking if a
string looks like encoded in UTF-8 from utf8.[ch].



Re: [PATCH 0/2] Several fixes for the test suite related to spaces in filenames

2018-01-03 Thread Jeff King
On Wed, Jan 03, 2018 at 05:54:44PM +0100, Johannes Schindelin wrote:

> The second issue was found long ago, and the patch carried in Git for
> Windows, although nothing about it is specific to Windows.
> 
> The first patch was developed today, when I tried to verify that Git's
> test suite passes if Git is cloned to a directory called `with spaces/`.

Heh, the whole point of the space in the trash directory was to find
these issues early, but obviously it is not foolproof. :)

The patches themselves look good to me from inspection. Thanks.

-Peff


[PATCH] bisect: fix a regression causing a segfault

2018-01-03 Thread Ævar Arnfjörð Bjarmason
In 7c117184d7 ("bisect: fix off-by-one error in
`best_bisection_sorted()`", 2017-11-05) the more careful logic dealing
with freeing p->next in 50e62a8e70 ("rev-list: implement
--bisect-all", 2007-10-22) was removed.

Restore the more careful check to avoid segfaulting. Ideally this
would come with a test case, but we don't have steps to reproduce
this, only a backtrace from gdb pointing to this being the issue.

Reported-by: Yasushi SHOJI 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 bisect.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 0fca17c02b..2f3008b078 100644
--- a/bisect.c
+++ b/bisect.c
@@ -229,8 +229,10 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
if (i < cnt - 1)
p = p->next;
}
-   free_commit_list(p->next);
-   p->next = NULL;
+   if (p) {
+   free_commit_list(p->next);
+   p->next = NULL;
+   }
strbuf_release();
free(array);
return list;
-- 
2.15.1.424.g9478a66081



Re: [BUG] v2.16.0-rc0 seg faults when git bisect skip

2018-01-03 Thread Martin Ågren
On 3 January 2018 at 15:21, Ævar Arnfjörð Bjarmason  wrote:
>
> On Wed, Jan 03 2018, Yasushi SHOJI jotted:
>
>> Hi,
>>
>> git version 2.16.0.rc0 seg faults on my machine when I
>> [...]
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x55a73107f900 in best_bisection_sorted (list=0x0, nr=0) at 
>> bisect.c:232
>> 232 free_commit_list(p->next);
>> (gdb) bt
>> #0  0x55a73107f900 in best_bisection_sorted (list=0x0, nr=0) at 
>> bisect.c:232
>> #1  0x55a73107fc0f in do_find_bisection (list=0x0, nr=0,
>> weights=0x55a731b6ffd0, find_all=1) at bisect.c:361
>> #2  0x55a73107fcf4 in find_bisection (commit_list=0x7ffe8750d4d0,
>> reaches=0x7ffe8750d4c4, all=0x7ffe8750d4c0, find_all=1) at
>> bisect.c:400
>> #3  0x55a73108128d in bisect_next_all (prefix=0x0, no_checkout=0)
>> at bisect.c:969
>> #4  0x55a730fd5238 in cmd_bisect__helper (argc=0,
>> argv=0x7ffe8750e230, prefix=0x0) at builtin/bisect--helper.c:140
>> #5  0x55a730fcbc76 in run_builtin (p=0x55a73145c778
>> , argc=2, argv=0x7ffe8750e230) at git.c:346
>> #6  0x55a730fcbf40 in handle_builtin (argc=2, argv=0x7ffe8750e230)
>> at git.c:554
>> #7  0x55a730fcc0e8 in run_argv (argcp=0x7ffe8750e0ec,
>> argv=0x7ffe8750e0e0) at git.c:606
>> #8  0x55a730fcc29b in cmd_main (argc=2, argv=0x7ffe8750e230) at git.c:683
>> #9  0x55a731068d9e in main (argc=3, argv=0x7ffe8750e228) at 
>> common-main.c:43
>> (gdb) p p
>> $1 = (struct commit_list *) 0x0
>>
>> As you can see, the code dereferences to the 'next' while 'p' is NULL.
>>
>> I'm sure I did 'git bisect good' after git _found_ bad commit.  Then I
>> typed 'git bisect skip' on the commit 726804874 of guile repository.
>> If that matters at all.
>>
>> I haven't touched guile repo to preserve the current state.
>
> I can't reproduce this myself, but looking at the backtrace it seems
> pretty obvious that 7c117184d7 ("bisect: fix off-by-one error in
> `best_bisection_sorted()`", 2017-11-05) is the culprit.
>
> That changed more careful code added by Christian in 50e62a8e70
> ("rev-list: implement --bisect-all", 2007-10-22) to free a pointer which
> as you can see can be NULL.
>
> If you can test a patch to see if it works this should fix it:
>
> diff --git a/bisect.c b/bisect.c
> index 0fca17c02b..2f3008b078 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -229,8 +229,10 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list, int n
> if (i < cnt - 1)
> p = p->next;
> }
> -   free_commit_list(p->next);
> -   p->next = NULL;
> +   if (p) {
> +   free_commit_list(p->next);
> +   p->next = NULL;
> +   }
> strbuf_release();
> free(array);
> return list;
>
> But given the commit message by Martin maybe there's some deeper bug here.

I haven't tried to reproduce, or tested the patch, but from the looks of
it, your analysis and fix are both spot on. The special case that yashi
has hit is that `list` is NULL. The old code handled that very well, the
code after my patch ... not so well. The loop-sort-loop pattern reduces
to a no-op, both before and after my patch. But what I failed to realize
was that `list` could be NULL.

This could be fixed by an early return if `list` is NULL, but that would
also need some memory-handling. So I think your patch is just as good or
better, since it can be seen as restoring what was lost in 7c117184d7.

Thanks both, and sorry for this.
Martin


[PATCH 1/2] Allow the test suite to pass in a directory whose name contains spaces

2018-01-03 Thread Johannes Schindelin
It is totally legitimate to clone Git's source code anywhere, including
into, say, directories whose name (or the name of its absolute path)
contains spaces.

However, a couple of tests failed to anticipate this, for lack of
quoting (or in one instance, for failure to expect more than one space
in the absolute path of the TEST_DIRECTORY). This can be easily verified
by calling these commands in your current clone:

git clone . with\ spaces
cd with\ spaces
make -j15 test

Let's fix this.

Signed-off-by: Johannes Schindelin 
---
 t/t7500-commit.sh  | 4 ++--
 t/t9020-remote-svn.sh  | 4 ++--
 t/t9107-git-svn-migrate.sh | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t7500-commit.sh b/t/t7500-commit.sh
index 5739d3ed232..1d33c5feb3e 100755
--- a/t/t7500-commit.sh
+++ b/t/t7500-commit.sh
@@ -130,8 +130,8 @@ EOF
 test_expect_success 'commit message from template with whitespace issue' '
echo "content galore" >>foo &&
git add foo &&
-   GIT_EDITOR="$TEST_DIRECTORY"/t7500/add-whitespaced-content git commit \
-   --template "$TEMPLATE" &&
+   GIT_EDITOR=\""$TEST_DIRECTORY"\"/t7500/add-whitespaced-content \
+   git commit --template "$TEMPLATE" &&
commit_msg_is "commit message"
 '
 
diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
index 4d81ba1c2c4..6fca08e5e35 100755
--- a/t/t9020-remote-svn.sh
+++ b/t/t9020-remote-svn.sh
@@ -25,8 +25,8 @@ init_git () {
git init &&
#git remote add svnsim 
testsvn::sim:///$TEST_DIRECTORY/t9020/example.svnrdump
# let's reuse an existing dump file!?
-   git remote add svnsim testsvn::sim://$TEST_DIRECTORY/t9154/svn.dump
-   git remote add svnfile testsvn::file://$TEST_DIRECTORY/t9154/svn.dump
+   git remote add svnsim "testsvn::sim://$TEST_DIRECTORY/t9154/svn.dump"
+   git remote add svnfile "testsvn::file://$TEST_DIRECTORY/t9154/svn.dump"
 }
 
 if test -e "$GIT_BUILD_DIR/git-remote-testsvn"
diff --git a/t/t9107-git-svn-migrate.sh b/t/t9107-git-svn-migrate.sh
index 9f3ef8f2ef6..ceaa5bad105 100755
--- a/t/t9107-git-svn-migrate.sh
+++ b/t/t9107-git-svn-migrate.sh
@@ -28,7 +28,7 @@ test_expect_success 'git-svn-HEAD is a real HEAD' '
git rev-parse --verify refs/heads/git-svn-HEAD^0
 '
 
-svnrepo_escaped=$(echo $svnrepo | sed 's/ /%20/')
+svnrepo_escaped=$(echo $svnrepo | sed 's/ /%20/g')
 
 test_expect_success 'initialize old-style (v0) git svn layout' '
mkdir -p "$GIT_DIR"/git-svn/info "$GIT_DIR"/svn/info &&
-- 
2.15.1.windows.2.391.g0b42e3c56de




[PATCH 0/2] Several fixes for the test suite related to spaces in filenames

2018-01-03 Thread Johannes Schindelin
The second issue was found long ago, and the patch carried in Git for
Windows, although nothing about it is specific to Windows.

The first patch was developed today, when I tried to verify that Git's
test suite passes if Git is cloned to a directory called `with spaces/`.


Johannes Schindelin (2):
  Allow the test suite to pass in a directory whose name contains spaces
  t0302 & t3900: add forgotten quotes

 t/t0302-credential-store.sh | 2 +-
 t/t3900-i18n-commit.sh  | 8 
 t/t7500-commit.sh   | 4 ++--
 t/t9020-remote-svn.sh   | 4 ++--
 t/t9107-git-svn-migrate.sh  | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)


base-commit: 1eaabe34fc6f486367a176207420378f587d3b48
Published-As: https://github.com/dscho/git/releases/tag/with-spaces-v1
Fetch-It-Via: git fetch https://github.com/dscho/git with-spaces-v1
-- 
2.15.1.windows.2.391.g0b42e3c56de



[PATCH 2/2] t0302 & t3900: add forgotten quotes

2018-01-03 Thread Johannes Schindelin
When cleaning up files in the $HOME directory, it really makes sense to
quote the path, especially in Git's test suite, where the HOME directory
is *guaranteed* to contain spaces in its name.

It would appear that those two tests pass even without cleaning up the
files, but really more by pure chance than by design (the cleanup seems
not actually to be necessary).

However, if anybody would have a left-over `trash/` directory in Git's
`t/` directory, these tests would fail, because they would all of a
sudden try to delete that directory, but without the `-r` (recursive)
flag. That is how this issue was found.

Signed-off-by: Johannes Schindelin 
---
 t/t0302-credential-store.sh | 2 +-
 t/t3900-i18n-commit.sh  | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t0302-credential-store.sh b/t/t0302-credential-store.sh
index 1d8d1f210b9..d6b54e8c65a 100755
--- a/t/t0302-credential-store.sh
+++ b/t/t0302-credential-store.sh
@@ -37,7 +37,7 @@ helper_test store
 unset XDG_CONFIG_HOME
 
 test_expect_success 'if custom xdg file exists, home and xdg files not 
created' '
-   test_when_finished "rm -f $HOME/xdg/git/credentials" &&
+   test_when_finished "rm -f \"$HOME/xdg/git/credentials\"" &&
test -s "$HOME/xdg/git/credentials" &&
test_path_is_missing "$HOME/.git-credentials" &&
test_path_is_missing "$HOME/.config/git/credentials"
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 3b94283e355..9e4e694d939 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -40,7 +40,7 @@ test_expect_success 'UTF-16 refused because of NULs' '
 '
 
 test_expect_success 'UTF-8 invalid characters refused' '
-   test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+   test_when_finished "rm -f \"$HOME/stderr $HOME/invalid\"" &&
echo "UTF-8 characters" >F &&
printf "Commit message\n\nInvalid surrogate:\355\240\200\n" \
>"$HOME/invalid" &&
@@ -49,7 +49,7 @@ test_expect_success 'UTF-8 invalid characters refused' '
 '
 
 test_expect_success 'UTF-8 overlong sequences rejected' '
-   test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+   test_when_finished "rm -f \"$HOME/stderr $HOME/invalid\"" &&
rm -f "$HOME/stderr" "$HOME/invalid" &&
echo "UTF-8 overlong" >F &&
printf "\340\202\251ommit message\n\nThis is not a space:\300\240\n" \
@@ -59,7 +59,7 @@ test_expect_success 'UTF-8 overlong sequences rejected' '
 '
 
 test_expect_success 'UTF-8 non-characters refused' '
-   test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+   test_when_finished "rm -f \"$HOME/stderr $HOME/invalid\"" &&
echo "UTF-8 non-character 1" >F &&
printf "Commit message\n\nNon-character:\364\217\277\276\n" \
>"$HOME/invalid" &&
@@ -68,7 +68,7 @@ test_expect_success 'UTF-8 non-characters refused' '
 '
 
 test_expect_success 'UTF-8 non-characters refused' '
-   test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+   test_when_finished "rm -f \"$HOME/stderr $HOME/invalid\"" &&
echo "UTF-8 non-character 2." >F &&
printf "Commit message\n\nNon-character:\357\267\220\n" \
>"$HOME/invalid" &&
-- 
2.15.1.windows.2.391.g0b42e3c56de


[PATCH 04/40] fsck: introduce promisor objects

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/fsck.c   |  2 +-
 cache.h  |  3 +-
 packfile.c   | 78 --
 packfile.h   | 13 
 t/t0410-partial-clone.sh | 81 
 5 files changed, 172 insertions(+), 5 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 04846d46f9..793d289367 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!is_promisor_object(oid)) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 7b27abac35..078607ee91 100644
--- a/cache.h
+++ b/cache.h
@@ -1649,7 +1649,8 @@ extern struct packed_git {
unsigned pack_local:1,
 pack_keep:1,
 freshened:1,
-do_not_close:1;
+do_not_close:1,
+pack_promisor:1;
unsigned char sha1[20];
struct revindex_entry *revindex;
/* something like ".git/objects/pack/x.pack" */
diff --git a/packfile.c b/packfile.c
index 4a5fe7ab18..aee6c3d674 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,12 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
+#include "external-odb.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -643,10 +649,10 @@ struct packed_git *add_packed_git(const char *path, 
size_t path_len, int local)
return NULL;
 
/*
-* ".pack" is long enough to hold any suffix we're adding (and
+* ".promisor" is long enough to hold any suffix we're adding (and
 * the use xsnprintf double-checks that)
 */
-   alloc = st_add3(path_len, strlen(".pack"), 1);
+   alloc = st_add3(path_len, strlen(".promisor"), 1);
p = alloc_packed_git(alloc);
memcpy(p->pack_name, path, path_len);
 
@@ -654,6 +660,10 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
if (!access(p->pack_name, F_OK))
p->pack_keep = 1;
 
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor");
+   if (!access(p->pack_name, F_OK))
+   p->pack_promisor = 1;
+
xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
free(p);
@@ -781,7 +791,8 @@ static void prepare_packed_git_one(char *objdir, int local)
if (ends_with(de->d_name, ".idx") ||
ends_with(de->d_name, ".pack") ||
ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep"))
+   ends_with(de->d_name, ".keep") ||
+   ends_with(de->d_name, ".promisor"))
string_list_append(, path.buf);
else
report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
@@ -1889,6 +1900,9 @@ int for_each_packed_object(each_packed_object_fn cb, void 
*data, unsigned flags)
for (p = packed_git; p; p = p->next) {
if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
continue;
+   if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) &&
+   !p->pack_promisor)
+   continue;
if (open_pack_index(p)) {
pack_errors = 1;
continue;
@@ -1899,3 +1913,61 @@ int for_each_packed_object(each_packed_object_fn cb, 
void *data, unsigned flags)
}
return r ? r : 

[PATCH 05/40] fsck: support refs pointing to promisor objects

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/fsck.c   |  8 
 t/t0410-partial-clone.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 793d289367..c6bb29d242 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (is_promisor_object(oid)) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+default_refs++;
+return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 9257b8c885..c4639e1134 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+   HASH=$(git -C repo rev-parse "$1") &&
+   git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+   git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+   git -C repo tag -d my_annotated_tag &&
+   delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes 
fsck' '
test_create_repo repo &&
test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, 
even with extension
test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref
+   git -C repo branch my_branch "$A" &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable

2018-01-03 Thread Christian Couder
This new environment variable will be used to perform git
commands without involving any external odb mechanism.

This makes it possible for example to create new blobs that
will not be sent to an external odb even if the external odb
supports "put_*" instructions.

Signed-off-by: Christian Couder 
---
 cache.h| 9 +
 environment.c  | 4 
 external-odb.c | 3 +--
 sha1_file.c| 3 +++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 21af6442af..7b27abac35 100644
--- a/cache.h
+++ b/cache.h
@@ -437,6 +437,7 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
 #define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
+#define NO_EXTERNAL_ODB_ENVIRONMENT "GIT_NO_EXTERNAL_ODB"
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
@@ -813,6 +814,14 @@ void reset_shared_repository(void);
 extern int check_replace_refs;
 extern char *git_replace_ref_base;
 
+/*
+ * Do external odbs need to be used this run?  This variable is
+ * initialized to true unless $GIT_NO_EXTERNAL_ODB is set, but it
+ * maybe set to false by some commands that do not want external
+ * odbs to be active.
+ */
+extern int use_external_odb;
+
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
diff --git a/environment.c b/environment.c
index 63ac38a46f..b3bd0daae2 100644
--- a/environment.c
+++ b/environment.c
@@ -48,6 +48,7 @@ const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
 int check_replace_refs = 1;
 char *git_replace_ref_base;
+int use_external_odb = 1;
 enum eol core_eol = EOL_UNSET;
 enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
 unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@@ -117,6 +118,7 @@ const char * const local_repo_env[] = {
INDEX_ENVIRONMENT,
NO_REPLACE_OBJECTS_ENVIRONMENT,
GIT_REPLACE_REF_BASE_ENVIRONMENT,
+   NO_EXTERNAL_ODB_ENVIRONMENT,
GIT_PREFIX_ENVIRONMENT,
GIT_SUPER_PREFIX_ENVIRONMENT,
GIT_SHALLOW_FILE_ENVIRONMENT,
@@ -156,6 +158,8 @@ void setup_git_env(void)
free(git_replace_ref_base);
git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
  : "refs/replace/");
+   if (getenv(NO_EXTERNAL_ODB_ENVIRONMENT))
+   use_external_odb = 0;
free(namespace);
namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
diff --git a/external-odb.c b/external-odb.c
index f3ea491333..390958dbfe 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -43,7 +43,7 @@ static void external_odb_init(void)
 {
static int initialized;
 
-   if (initialized)
+   if (initialized || !use_external_odb)
return;
initialized = 1;
 
@@ -69,4 +69,3 @@ int external_odb_has_object(const unsigned char *sha1)
return 1;
return 0;
 }
-
diff --git a/sha1_file.c b/sha1_file.c
index 3f5ff274e2..cba6b2a537 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -675,6 +675,9 @@ void prepare_external_alt_odb(void)
static int linked_external;
const char *path;
 
+   if (!use_external_odb)
+   return;
+
if (linked_external)
return;
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 06/40] fsck: support referenced promisor objects

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/fsck.c   | 11 +++
 t/t0410-partial-clone.sh | 23 +++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index c6bb29d242..b8bcb0e40c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void 
*data, struct fsck_opt
if (obj->flags & REACHABLE)
return 0;
obj->flags |= REACHABLE;
+
+   if (is_promisor_object(>oid))
+   /*
+* Further recursion does not need to be performed on this
+* object since it is a promisor object (so it does not need to
+* be added to "pending").
+*/
+   return 0;
+
if (!(obj->flags & HAS_OBJ)) {
if (parent && !has_object_file(>oid)) {
printf("broken link from %7s %s\n",
@@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj)
 * do a full fsck
 */
if (!(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object(>oid))
+   return;
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index c4639e1134..46c88e8dfa 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, 
passes fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   test_commit -C repo 2 &&
+   test_commit -C repo 3 &&
+   git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+   C=$(git -C repo rev-parse 1) &&
+   T=$(git -C repo rev-parse 2^{tree}) &&
+   B=$(git hash-object repo/3.t) &&
+   AT=$(git -C repo rev-parse annotated_tag) &&
+
+   promise_and_delete "$C" &&
+   promise_and_delete "$T" &&
+   promise_and_delete "$B" &&
+   promise_and_delete "$AT" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 03/40] external-odb: add has_external_odb()

2018-01-03 Thread Christian Couder
This function will be used to check if the external odb
mechanism is actually used.

Signed-off-by: Christian Couder 
---
 external-odb.c | 7 +++
 external-odb.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 390958dbfe..d26e63d8b1 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -50,6 +50,13 @@ static void external_odb_init(void)
git_config(external_odb_config, NULL);
 }
 
+int has_external_odb(void)
+{
+   external_odb_init();
+
+   return !!helpers;
+}
+
 const char *external_odb_root(void)
 {
static const char *root;
diff --git a/external-odb.h b/external-odb.h
index ae2b228792..9a3c2f01b3 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -1,6 +1,7 @@
 #ifndef EXTERNAL_ODB_H
 #define EXTERNAL_ODB_H
 
+extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 15/40] external-odb: add script mode support

2018-01-03 Thread Christian Couder
This adds support for the script command mode where
an helper script or command is called to retrieve or
manage objects.

This implements the 'have' and 'get_git_obj'
instructions for the script mode.

Signed-off-by: Christian Couder 
---
 external-odb.c  |  51 ++-
 external-odb.h  |   1 +
 odb-helper.c| 218 +++-
 odb-helper.h|   4 +
 sha1_file.c |  12 ++-
 t/t0400-external-odb.sh |  44 ++
 6 files changed, 327 insertions(+), 3 deletions(-)
 create mode 100755 t/t0400-external-odb.sh

diff --git a/external-odb.c b/external-odb.c
index 5d0afb9762..81f2aa5fac 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -33,8 +33,14 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 
o = find_or_create_helper(name, namelen);
 
-   if (!strcmp(subkey, "promisorremote"))
+   if (!strcmp(subkey, "promisorremote")) {
+   o->type = ODB_HELPER_GIT_REMOTE;
return git_config_string(>dealer, var, value);
+   }
+   if (!strcmp(subkey, "scriptcommand")) {
+   o->type = ODB_HELPER_SCRIPT_CMD;
+   return git_config_string(>dealer, var, value);
+   }
 
return 0;
 }
@@ -77,6 +83,49 @@ int external_odb_has_object(const unsigned char *sha1)
return 0;
 }
 
+int external_odb_get_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+   const char *path;
+
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   path = sha1_file_name_alt(external_odb_root(), sha1);
+   safe_create_leading_directories_const(path);
+   prepare_external_alt_odb();
+
+   for (o = helpers; o; o = o->next) {
+   struct strbuf tmpfile = STRBUF_INIT;
+   int ret;
+   int fd;
+
+   if (!odb_helper_has_object(o, sha1))
+   continue;
+
+   fd = create_object_tmpfile(, path);
+   if (fd < 0) {
+   strbuf_release();
+   return -1;
+   }
+
+   if (odb_helper_get_object(o, sha1, fd) < 0) {
+   close(fd);
+   unlink(tmpfile.buf);
+   strbuf_release();
+   continue;
+   }
+
+   close_sha1_file(fd);
+   ret = finalize_object_file(tmpfile.buf, path);
+   strbuf_release();
+   if (!ret)
+   return 0;
+   }
+
+   return -1;
+}
+
 int external_odb_get_direct(const unsigned char *sha1)
 {
struct odb_helper *o;
diff --git a/external-odb.h b/external-odb.h
index fd6708163e..fb8b94972f 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,6 +4,7 @@
 extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
+extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 4b70b287af..c1a3443dc7 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -21,13 +21,124 @@ struct odb_helper_cmd {
struct child_process child;
 };
 
+/*
+ * Callers are responsible to ensure that the result of vaddf(fmt, ap)
+ * is properly shell-quoted.
+ */
+static void prepare_helper_command(struct argv_array *argv, const char *cmd,
+  const char *fmt, va_list ap)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   strbuf_addstr(, cmd);
+   strbuf_addch(, ' ');
+   strbuf_vaddf(, fmt, ap);
+
+   argv_array_push(argv, buf.buf);
+   strbuf_release();
+}
+
+__attribute__((format (printf,3,4)))
+static int odb_helper_start(struct odb_helper *o,
+   struct odb_helper_cmd *cmd,
+   const char *fmt, ...)
+{
+   va_list ap;
+
+   memset(cmd, 0, sizeof(*cmd));
+   argv_array_init(>argv);
+
+   if (!o->dealer)
+   return -1;
+
+   va_start(ap, fmt);
+   prepare_helper_command(>argv, o->dealer, fmt, ap);
+   va_end(ap);
+
+   cmd->child.argv = cmd->argv.argv;
+   cmd->child.use_shell = 1;
+   cmd->child.no_stdin = 1;
+   cmd->child.out = -1;
+
+   if (start_command(>child) < 0) {
+   argv_array_clear(>argv);
+   return -1;
+   }
+
+   return 0;
+}
+
+static int odb_helper_finish(struct odb_helper *o,
+struct odb_helper_cmd *cmd)
+{
+   int ret = finish_command(>child);
+   argv_array_clear(>argv);
+   if (ret) {
+   warning("odb helper '%s' reported failure", o->name);
+   return -1;
+   }
+   return 0;
+}
+
+static int parse_object_line(struct odb_helper_object *o, const char *line)
+{
+   

[PATCH 01/40] Add initial external odb support

2018-01-03 Thread Christian Couder
The external-odb.{c,h} files will contain the functions
that are called by the rest of Git mostly from
"sha1_file.c" to access the objects managed by the
external odbs.

The odb-helper.{c,h} files will contain the functions to
actually implement communication with either the internal
functions or the external scripts or processes that will
manage and provide external git objects.

For now only infrastructure to create helpers from the
config and to manage a cache for the 'have' command is
implemented.

Helped-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile   |  2 ++
 cache.h|  1 +
 external-odb.c | 72 ++
 external-odb.h |  7 ++
 odb-helper.c   | 54 +++
 odb-helper.h   | 24 
 sha1_file.c| 19 +++-
 7 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h

diff --git a/Makefile b/Makefile
index 2a81ae22e9..07694185c9 100644
--- a/Makefile
+++ b/Makefile
@@ -799,6 +799,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
@@ -834,6 +835,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
 LIB_OBJS += packfile.o
diff --git a/cache.h b/cache.h
index a2ec8c0b55..21af6442af 100644
--- a/cache.h
+++ b/cache.h
@@ -1587,6 +1587,7 @@ extern void prepare_alt_odb(void);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 00..f3ea491333
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,72 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+#include "config.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = 
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+   struct odb_helper *o;
+
+   for (o = helpers; o; o = o->next)
+   if (!strncmp(o->name, name, len) && !o->name[len])
+   return o;
+
+   o = odb_helper_new(name, len);
+   *helpers_tail = o;
+   helpers_tail = >next;
+
+   return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+   struct odb_helper *o;
+   const char *name;
+   int namelen;
+   const char *subkey;
+
+   if (parse_config_key(var, "odb", , , ) < 0)
+   return 0;
+
+   o = find_or_create_helper(name, namelen);
+
+   if (!strcmp(subkey, "promisorremote"))
+   return git_config_string(>dealer, var, value);
+
+   return 0;
+}
+
+static void external_odb_init(void)
+{
+   static int initialized;
+
+   if (initialized)
+   return;
+   initialized = 1;
+
+   git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+   static const char *root;
+   if (!root)
+   root = git_pathdup("objects/external");
+   return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next)
+   if (odb_helper_has_object(o, sha1))
+   return 1;
+   return 0;
+}
+
diff --git a/external-odb.h b/external-odb.h
new file mode 100644
index 00..ae2b228792
--- /dev/null
+++ b/external-odb.h
@@ -0,0 +1,7 @@
+#ifndef EXTERNAL_ODB_H
+#define EXTERNAL_ODB_H
+
+extern const char *external_odb_root(void);
+extern int external_odb_has_object(const unsigned char *sha1);
+
+#endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
new file mode 100644
index 00..1404393807
--- /dev/null
+++ b/odb-helper.c
@@ -0,0 +1,54 @@
+#include "cache.h"
+#include "object.h"
+#include "argv-array.h"
+#include "odb-helper.h"
+#include "run-command.h"
+#include "sha1-lookup.h"
+
+struct odb_helper *odb_helper_new(const char *name, int namelen)
+{
+   struct odb_helper *o;
+
+   o = xcalloc(1, sizeof(*o));
+   o->name = xmemdupz(name, namelen);
+
+   return o;
+}
+
+struct odb_helper_cmd {
+   struct argv_array argv;
+   struct child_process child;
+};
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+   if (o->have_valid)
+   return;
+   o->have_valid = 1;
+
+   /* TODO 

[PATCH 16/40] odb-helper: add 'enum odb_helper_type'

2018-01-03 Thread Christian Couder
As there will be different kinds of helpers, let's add
an "enum odb_helper_type" to tell between the different
kinds.

Let's add a field with this type in "struct odb_helper",
and set it when reading the config file.

Signed-off-by: Christian Couder 
---
 odb-helper.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/odb-helper.h b/odb-helper.h
index 90b279c07e..4f2ac5e476 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,9 +1,18 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+enum odb_helper_type {
+   ODB_HELPER_NONE = 0,
+   ODB_HELPER_GIT_REMOTE,
+   ODB_HELPER_SCRIPT_CMD,
+   ODB_HELPER_PROCESS_CMD,
+   OBJ_HELPER_MAX
+};
+
 struct odb_helper {
const char *name;
const char *dealer;
+   enum odb_helper_type type;
 
struct odb_helper_object {
unsigned char sha1[20];
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 07/40] fsck: support promisor objects as CLI argument

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/fsck.c   |  2 ++
 t/t0410-partial-clone.sh | 13 +
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b8bcb0e40c..a6fa6d6482 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object())
+   continue;
error("%s: object missing", oid_to_hex());
errors_found |= ERROR_OBJECT;
continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 46c88e8dfa..a0f901fa1d 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes 
fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+   git -C repo fsck "$A"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction

2018-01-03 Thread Christian Couder
Let's add an odb_helper_init() function to send an 'init'
instruction to the helpers. This 'init' instruction is
especially useful to get the capabilities that are supported
by the helpers.

So while at it, let's also add a parse_capabilities()
function to parse them and a supported_capabilities
variable in struct odb_helper to store them.

Signed-off-by: Christian Couder 
---
 external-odb.c  | 13 +++-
 odb-helper.c| 54 +
 odb-helper.h| 12 +++
 t/t0400-external-odb.sh |  4 
 4 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index 81f2aa5fac..2622c12853 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -35,6 +35,8 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 
if (!strcmp(subkey, "promisorremote")) {
o->type = ODB_HELPER_GIT_REMOTE;
+   o->supported_capabilities |= ODB_HELPER_CAP_HAVE;
+   o->supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
return git_config_string(>dealer, var, value);
}
if (!strcmp(subkey, "scriptcommand")) {
@@ -48,12 +50,16 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 static void external_odb_init(void)
 {
static int initialized;
+   struct odb_helper *o;
 
if (initialized || !use_external_odb)
return;
initialized = 1;
 
git_config(external_odb_config, NULL);
+
+   for (o = helpers; o; o = o->next)
+   odb_helper_init(o);
 }
 
 int has_external_odb(void)
@@ -77,9 +83,12 @@ int external_odb_has_object(const unsigned char *sha1)
 
external_odb_init();
 
-   for (o = helpers; o; o = o->next)
+   for (o = helpers; o; o = o->next) {
+   if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
+   return 1;
if (odb_helper_has_object(o, sha1))
return 1;
+   }
return 0;
 }
 
@@ -133,6 +142,8 @@ int external_odb_get_direct(const unsigned char *sha1)
external_odb_init();
 
for (o = helpers; o; o = o->next) {
+   if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT))
+   continue;
if (odb_helper_get_direct(o, sha1) < 0)
continue;
return 0;
diff --git a/odb-helper.c b/odb-helper.c
index c1a3443dc7..ea642fd438 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -6,6 +6,40 @@
 #include "sha1-lookup.h"
 #include "fetch-object.h"
 
+static void parse_capabilities(char *cap_buf,
+  unsigned int *supported_capabilities,
+  const char *process_name)
+{
+   struct string_list cap_list = STRING_LIST_INIT_NODUP;
+
+   string_list_split_in_place(_list, cap_buf, '=', 1);
+
+   if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, 
"capability")) {
+   const char *cap_name = cap_list.items[1].string;
+
+   if (!strcmp(cap_name, "get_git_obj")) {
+   *supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
+   } else if (!strcmp(cap_name, "get_raw_obj")) {
+   *supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
+   } else if (!strcmp(cap_name, "get_direct")) {
+   *supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
+   } else if (!strcmp(cap_name, "put_git_obj")) {
+   *supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
+   } else if (!strcmp(cap_name, "put_raw_obj")) {
+   *supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
+   } else if (!strcmp(cap_name, "put_direct")) {
+   *supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
+   } else if (!strcmp(cap_name, "have")) {
+   *supported_capabilities |= ODB_HELPER_CAP_HAVE;
+   } else {
+   warning("external process '%s' requested unsupported 
read-object capability '%s'",
+   process_name, cap_name);
+   }
+   }
+
+   string_list_clear(_list, 0);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
struct odb_helper *o;
@@ -80,6 +114,26 @@ static int odb_helper_finish(struct odb_helper *o,
return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+   struct odb_helper_cmd cmd;
+   FILE *fh;
+   struct strbuf line = STRBUF_INIT;
+
+   if (odb_helper_start(o, , "init") < 0)
+   return -1;
+
+   fh = xfdopen(cmd.child.out, "r");
+   while (strbuf_getline(, fh) != EOF)
+   parse_capabilities(line.buf, >supported_capabilities, 
o->name);
+
+   strbuf_release();
+   fclose(fh);
+   

[PATCH 14/40] sha1_file: prepare for external odbs

2018-01-03 Thread Christian Couder
In the following commits we will need some functions that were
internal to sha1_file.c, so let's first make them non static
and declare them in "cache.h". While at it, let's rename
'create_tmpfile()' to 'create_object_tmpfile()' to make its
name less generic.

Let's also split out 'sha1_file_name_alt()' from
'sha1_file_name()' and 'open_sha1_file_alt()' from
'open_sha1_file()', as we will need both of these new
functions too.

Helped-by: Jeff King 
Signed-off-by: Christian Couder 
---
 cache.h |  8 
 sha1_file.c | 47 +--
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 3fabf998ce..f41c102cb4 100644
--- a/cache.h
+++ b/cache.h
@@ -964,6 +964,12 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
+/*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char 
*sha1);
+
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1251,6 +1257,8 @@ extern int parse_sha1_header(const char *hdr, unsigned 
long *sizep);
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 261baf800f..785e8dda03 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -322,17 +322,22 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
}
 }
 
-const char *sha1_file_name(const unsigned char *sha1)
+const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1)
 {
static struct strbuf buf = STRBUF_INIT;
 
strbuf_reset();
-   strbuf_addf(, "%s/", get_object_directory());
+   strbuf_addf(, "%s/", objdir);
 
fill_sha1_path(, sha1);
return buf.buf;
 }
 
+const char *sha1_file_name(const unsigned char *sha1)
+{
+   return sha1_file_name_alt(get_object_directory(), sha1);
+}
+
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
 {
strbuf_setlen(>scratch, alt->base_len);
@@ -902,24 +907,14 @@ static int stat_sha1_file(const unsigned char *sha1, 
struct stat *st,
return -1;
 }
 
-/*
- * Like stat_sha1_file(), but actually open the object and return the
- * descriptor. See the caveats on the "path" parameter above.
- */
-static int open_sha1_file(const unsigned char *sha1, const char **path)
+static int open_sha1_file_alt(const unsigned char *sha1, const char **path)
 {
-   int fd;
struct alternate_object_database *alt;
-   int most_interesting_errno;
-
-   *path = sha1_file_name(sha1);
-   fd = git_open(*path);
-   if (fd >= 0)
-   return fd;
-   most_interesting_errno = errno;
+   int most_interesting_errno = errno;
 
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
+   int fd;
*path = alt_sha1_path(alt, sha1);
fd = git_open(*path);
if (fd >= 0)
@@ -931,6 +926,22 @@ static int open_sha1_file(const unsigned char *sha1, const 
char **path)
return -1;
 }
 
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
+{
+   int fd;
+
+   *path = sha1_file_name(sha1);
+   fd = git_open(*path);
+   if (fd >= 0)
+   return fd;
+
+   return open_sha1_file_alt(sha1, path);
+}
+
 /*
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
@@ -1527,7 +1538,7 @@ int hash_sha1_file(const void *buf, unsigned long len, 
const char *type,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_sha1_file(int fd)
+void close_sha1_file(int fd)
 {
if (fsync_object_files)
fsync_or_die(fd, "sha1 file");
@@ -1551,7 +1562,7 @@ static inline int directory_size(const char *filename)
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+int create_object_tmpfile(struct strbuf *tmp, const char *filename)
 {
int fd, dirlen = directory_size(filename);
 
@@ -1591,7 +1602,7 @@ static int write_loose_object(const unsigned char *sha1, 
char *hdr, int hdrlen,
static struct strbuf tmp_file = 

[PATCH 13/40] gc: do not repack promisor packfiles

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-pack-objects.txt | 11 
 builtin/gc.c   |  4 +++
 builtin/pack-objects.c | 37 --
 builtin/prune.c|  7 +
 builtin/repack.c   |  9 +--
 t/t0410-partial-clone.sh   | 54 --
 6 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index aa403d02f3..81bc490ac5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default 
action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+   Omit objects that are known to be in the promisor remote.  (This
+   option has the purpose of operating only on locally created objects,
+   so that when we repack, we still maintain a distinction between
+   locally created objects [without .promisor] and objects from the
+   promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0edf..cef1461d1a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -458,6 +459,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(, prune_expire);
if (quiet)
argv_array_push(, "--no-progress");
+   if (has_external_odb())
+   argv_array_push(,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b9cfc289d..6c71552cdf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-   MA_ERROR = 0,/* fail if any missing objects are encountered */
-   MA_ALLOW_ANY,/* silently allow ALL missing objects */
+   MA_ERROR = 0,  /* fail if any missing objects are encountered */
+   MA_ALLOW_ANY,  /* silently allow ALL missing objects */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2578,6 +2581,20 @@ static void show_object__ma_allow_any(struct object 
*obj, const char *name, void
show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char 
*name, void *data)
+{
+   assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+   /*
+* Quietly ignore EXPECTED missing objects.  This avoids problems with
+* staging them now and getting an odd error later.
+*/
+   if (!has_object_file(>oid) && is_promisor_object(>oid))
+   return;
+
+   show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
   const char *arg, int unset)
 {
@@ -2592,10 +2609,18 @@ static int option_parse_missing_action(const struct 
option *opt,
 
if (!strcmp(arg, "allow-any")) {
arg_missing_action = MA_ALLOW_ANY;
+   fetch_if_missing = 0;
fn_show_object = show_object__ma_allow_any;
return 0;
}
 
+   if (!strcmp(arg, "allow-promisor")) {
+   arg_missing_action = 

[PATCH 00/40] Promisor remotes and external ODB support

2018-01-03 Thread Christian Couder
This is an early patch series that start to merge the
jh/fsck-promisors patch series (which is currently in pu) with the
external odb patch series.

The merge is not complete and there is still work needed, but all the
tests pass and in my opinion this shows that it is a good way forward
to share the same mechanism to handle (many) remote object stores and
the related fsck and gc problems.

The jh/partial-clone (a separate patch series on top of
jh/fsck-promisors) still needs some work before it can be used on top
of this series. I rebased it on top but the tests do not pass yet. The
result of my rebase and current attempt to fix tests is here:

https://github.com/chriscool/git/commits/gl-partial-clone-rebased

This patch series does not include the last part of the previous
external odb series which was about adding an `--inital-refspec`
option to `git clone`.

A few promisor related links


v6 partial clone part 2:
https://public-inbox.org/git/20171205165854.64979-1-...@jeffhostetler.com/

v7 partial clone part 3:
https://public-inbox.org/git/20171208155851.855-1-...@jeffhostetler.com/

External odb related links
~~

Peff started to work on external odbs some years ago:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
http://thread.gmane.org/gmane.comp.version-control.git/247171
http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

His work, which is not compile-tested any more, is still there:

https://github.com/peff/git/commits/jk/external-odb-wip

Initial discussions about external odbs are there:

http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160

Version 1, 2, 3, 4, 5 and 6 of the external odbs series are here:

https://public-inbox.org/git/20160613085546.11784-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20160628181933.24620-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20161130210420.15982-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20170803091926.1755-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20170916080731.13925-1-chrisc...@tuxfamily.org/

Some of the discussions related to Ben Peart's work that is used by
this series are here:

https://public-inbox.org/git/20170113155253.1644-1-benpe...@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpe...@microsoft.com/
https://public-inbox.org/git/20170714132651.170708-1-benpe...@microsoft.com/

Version 1, 2, 3, 4, 5 and 6 of the external odbs series are here:

https://github.com/chriscool/git/commits/gl-external-odb12
https://github.com/chriscool/git/commits/gl-external-odb22
https://github.com/chriscool/git/commits/gl-external-odb61
https://github.com/chriscool/git/commits/gl-external-odb239
https://github.com/chriscool/git/commits/gl-external-odb373
https://github.com/chriscool/git/commits/gl-external-odb411

A patch series to add Git/Packet.pm that is now in master is also
related:

https://public-inbox.org/git/20171019123030.17338-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20171105213836.11717-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20171110132200.7871-1-chrisc...@tuxfamily.org/


https://public-inbox.org/git/20171019123030.17338-1-chrisc...@tuxfamily.org/

Ben Peart (1):
  Add t0450 to test 'get_direct' mechanism

Christian Couder (30):
  Add initial external odb support
  Add GIT_NO_EXTERNAL_ODB env variable
  external-odb: add has_external_odb()
  external-odb: implement external_odb_get_direct
  sha1_file: prepare for external odbs
  external-odb: add script mode support
  odb-helper: add 'enum odb_helper_type'
  odb-helper: add odb_helper_init() to send 'init' instruction
  t0400: add 'put_raw_obj' instruction to odb-helper script
  external odb: add 'put_raw_obj' support
  external-odb: accept only blobs for now
  t0400: add test for external odb write support
  Add t0410 to test external ODB transfer
  lib-httpd: pass config file to start_httpd()
  lib-httpd: add upload.sh
  lib-httpd: add list.sh
  lib-httpd: add apache-e-odb.conf
  odb-helper: add odb_helper_get_raw_object()
  pack-objects: don't pack objects in external odbs
  Add t0420 to test transfer to HTTP external odb
  external-odb: add 'get_direct' support
  odb-helper: add 'script_mode' to 'struct odb_helper'
  odb-helper: add init_object_process()
  Add t0460 to test passing git objects
  odb-helper: add put_object_process()
  Add t0470 to test passing raw objects
  odb-helper: add have_object_process()
  Add t0480 to test "have" capability and raw objects
  external-odb: use 'odb=magic' attribute to mark odb blobs
  Add Documentation/technical/external-odb.txt

Jonathan Tan (9):
  fsck: introduce promisor objects
  fsck: support refs pointing to promisor objects
  fsck: support referenced promisor objects
  fsck: support promisor objects as CLI argument
  index-pack: refactor writing of 

[PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs

2018-01-03 Thread Christian Couder
To tell which blobs should be sent to the "magic" external odb,
let's require that the blobs be marked using the 'odb=magic'
attribute.

Signed-off-by: Christian Couder 
---
 external-odb.c | 25 ++---
 external-odb.h |  3 ++-
 sha1_file.c| 20 +++-
 t/t0400-external-odb.sh|  3 +++
 t/t0410-transfer-e-odb.sh  |  3 +++
 t/t0420-transfer-http-e-odb.sh |  3 +++
 t/t0470-read-object-http-e-odb.sh  |  3 +++
 t/t0480-read-object-have-http-e-odb.sh |  3 +++
 8 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 50c1cec50b..e3a05e24e3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -2,6 +2,7 @@
 #include "external-odb.h"
 #include "odb-helper.h"
 #include "config.h"
+#include "attr.h"
 
 static struct odb_helper *helpers;
 static struct odb_helper **helpers_tail = 
@@ -165,19 +166,37 @@ int external_odb_get_object(const unsigned char *sha1)
return external_odb_do_get_object(sha1);
 }
 
+static int has_odb_attrs(struct odb_helper *o, const char *path)
+{
+   static struct attr_check *check;
+
+   if (!check)
+   check = attr_check_initl("odb", NULL);
+
+   if (!git_check_attr(path, check)) {
+   const char *value = check->items[0].value;
+   return value ? !strcmp(o->name, value) : 0;
+   }
+   return 0;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
-   const char *type, unsigned char *sha1)
+   const char *type, unsigned char *sha1,
+   const char *path)
 {
struct odb_helper *o;
 
external_odb_init();
 
/* For now accept only blobs */
-   if (strcmp(type, "blob"))
+   if (!path || strcmp(type, "blob"))
return 1;
 
for (o = helpers; o; o = o->next) {
-   int r = odb_helper_put_object(o, buf, len, type, sha1);
+   int r;
+   if (!has_odb_attrs(o, path))
+   continue;
+   r = odb_helper_put_object(o, buf, len, type, sha1);
if (r <= 0)
return r;
}
diff --git a/external-odb.h b/external-odb.h
index 26bb931685..5a8936f417 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -7,6 +7,7 @@ extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
 extern int external_odb_put_object(const void *buf, size_t len,
-  const char *type, unsigned char *sha1);
+  const char *type, unsigned char *sha1,
+  const char *path);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/sha1_file.c b/sha1_file.c
index 300029459f..d3f395e967 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1696,7 +1696,9 @@ static int freshen_packed_object(const unsigned char 
*sha1)
return 1;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, 
unsigned char *sha1)
+static int write_sha1_file_with_path(const void *buf, unsigned long len,
+const char *type, unsigned char *sha1,
+const char *path)
 {
char hdr[32];
int hdrlen = sizeof(hdr);
@@ -1705,13 +1707,19 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
 * it out into .git/objects/??/?{38} file.
 */
write_sha1_file_prepare(buf, len, type, sha1, hdr, );
-   if (!external_odb_put_object(buf, len, type, sha1))
+   if (!external_odb_put_object(buf, len, type, sha1, path))
return 0;
if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
 
+int write_sha1_file(const void *buf, unsigned long len,
+   const char *type, unsigned char *sha1)
+{
+   return write_sha1_file_with_path(buf, len, type, sha1, NULL);
+}
+
 int hash_sha1_file_literally(const void *buf, unsigned long len, const char 
*type,
 struct object_id *oid, unsigned flags)
 {
@@ -1832,7 +1840,8 @@ static int index_mem(struct object_id *oid, void *buf, 
size_t size,
}
 
if (write_object)
-   ret = write_sha1_file(buf, size, typename(type), oid->hash);
+   ret = write_sha1_file_with_path(buf, size, typename(type),
+   oid->hash, path);
else
ret = hash_sha1_file(buf, size, typename(type), oid->hash);
if (re_allocated)
@@ -1854,8 +1863,9 @@ static int index_stream_convert_blob(struct object_id 
*oid, int fd,
   

[PATCH 20/40] external-odb: accept only blobs for now

2018-01-03 Thread Christian Couder
The mechanism to decide which blobs should be sent to which
external object database will be very simple for now.
If the external odb helper support any "put_*" instruction
all the new blobs will be sent to it.

Signed-off-by: Christian Couder 
---
 external-odb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 337bdd2540..93971e9ce4 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -159,6 +159,10 @@ int external_odb_put_object(const void *buf, size_t len,
 
external_odb_init();
 
+   /* For now accept only blobs */
+   if (strcmp(type, "blob"))
+   return 1;
+
for (o = helpers; o; o = o->next) {
int r = odb_helper_put_object(o, buf, len, type, sha1);
if (r <= 0)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 22/40] Add t0410 to test external ODB transfer

2018-01-03 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0410-transfer-e-odb.sh | 144 ++
 1 file changed, 144 insertions(+)
 create mode 100755 t/t0410-transfer-e-odb.sh

diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
new file mode 100755
index 00..065ec7d759
--- /dev/null
+++ b/t/t0410-transfer-e-odb.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+
+test_description='basic tests for transfering external ODBs'
+
+. ./test-lib.sh
+
+ORIG_SOURCE="$PWD/.git"
+export ORIG_SOURCE
+
+ALT_SOURCE1="$PWD/alt-repo1/.git"
+export ALT_SOURCE1
+write_script odb-helper1 <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+GIT_DIR=$ALT_SOURCE1; export GIT_DIR
+case "$1" in
+init)
+   echo "capability=get_git_obj"
+   echo "capability=have"
+   ;;
+have)
+   git cat-file --batch-check --batch-all-objects |
+   awk '{print $1 " " $3 " " $2}'
+   ;;
+get_git_obj)
+   cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+   ;;
+put_raw_obj)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$ORIG_SOURCE 
GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+   GIT_DIR=$ORIG_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER1="\"$PWD\"/odb-helper1"
+
+OTHER_SOURCE="$PWD/.git"
+export OTHER_SOURCE
+
+ALT_SOURCE2="$PWD/alt-repo2/.git"
+export ALT_SOURCE2
+write_script odb-helper2 <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+GIT_DIR=$ALT_SOURCE2; export GIT_DIR
+case "$1" in
+init)
+   echo "capability=get_git_obj"
+   echo "capability=have"
+   ;;
+have)
+   GIT_DIR=$OTHER_SOURCE git for-each-ref --format='%(objectname)' 
refs/odbs/magic/ | GIT_DIR=$OTHER_SOURCE xargs git show
+   ;;
+get_git_obj)
+   OBJ_FILE="$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+   if ! test -f "$OBJ_FILE"
+   then
+   # "Download" the missing object by copying it from alt-repo1
+   OBJ_DIR=$(echo $2 | sed 's/\(..\).*/\1/')
+   OBJ_BASE=$(basename "$OBJ_FILE")
+   ALT_OBJ_DIR1="$ALT_SOURCE1/objects/$OBJ_DIR"
+   ALT_OBJ_DIR2="$ALT_SOURCE2/objects/$OBJ_DIR"
+   mkdir -p "$ALT_OBJ_DIR2" || die "Could not mkdir 
'$ALT_OBJ_DIR2'"
+   OBJ_SRC="$ALT_OBJ_DIR1/$OBJ_BASE"
+   cp "$OBJ_SRC" "$ALT_OBJ_DIR2" ||
+   die "Could not cp '$OBJ_SRC' into '$ALT_OBJ_DIR2'"
+   fi
+   cat "$OBJ_FILE" || die "Could not cat '$OBJ_FILE'"
+   ;;
+put_raw_obj)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$OTHER_SOURCE 
GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+   GIT_DIR=$OTHER_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER2="\"$PWD\"/odb-helper2"
+
+test_expect_success 'setup first alternate repo' '
+   git init alt-repo1 &&
+   test_commit zero &&
+   git config odb.magic.scriptCommand "$HELPER1"
+'
+
+test_expect_success 'setup other repo and its alternate repo' '
+   git init other-repo &&
+   git init alt-repo2 &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+test_expect_success 'new blobs are put in first object store' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   content=$(cd alt-repo1 && git show "$hash1") &&
+   test "$content" = "one" &&
+   test_commit two &&
+   hash2=$(git ls-tree HEAD | grep two.t | cut -f1 | cut -d\  -f3) &&
+   content=$(cd alt-repo1 && git show "$hash2") &&
+   test "$content" = "two"
+'
+
+test_expect_success 'other repo gets the blobs from object store' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+test_must_fail git cat-file blob "$hash2" &&
+git config odb.magic.scriptCommand "$HELPER2" &&
+git cat-file blob "$hash1" &&
+git cat-file blob "$hash2"
+   )
+'
+
+test_expect_success 'other repo gets everything else' '
+   (cd other-repo &&
+git fetch origin &&
+content=$(git show "$hash1") &&
+test "$content" = "one" &&
+content=$(git show "$hash2") &&
+test "$content" = "two")
+'
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 19/40] external odb: add 'put_raw_obj' support

2018-01-03 Thread Christian Couder
Add support for a 'put_raw_obj' capability/instruction to send new
objects to an external odb. Objects will be sent as they are (in
their 'raw' format). They will not be converted to Git objects.

For now any new Git object (blob, tree, commit, ...) would be sent
if 'put_raw_obj' is supported by an odb helper. This is not a great
default, but let's leave it to following commits to tweak that.

Signed-off-by: Christian Couder 
---
 external-odb.c | 15 +++
 external-odb.h |  2 ++
 odb-helper.c   | 43 ++-
 odb-helper.h   |  3 +++
 sha1_file.c|  2 ++
 5 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 2622c12853..337bdd2540 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -151,3 +151,18 @@ int external_odb_get_direct(const unsigned char *sha1)
 
return -1;
 }
+
+int external_odb_put_object(const void *buf, size_t len,
+   const char *type, unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   int r = odb_helper_put_object(o, buf, len, type, sha1);
+   if (r <= 0)
+   return r;
+   }
+   return 1;
+}
diff --git a/external-odb.h b/external-odb.h
index fb8b94972f..26bb931685 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -6,5 +6,7 @@ extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
+extern int external_odb_put_object(const void *buf, size_t len,
+  const char *type, unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index ea642fd438..6f56f07b38 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -72,9 +72,10 @@ static void prepare_helper_command(struct argv_array *argv, 
const char *cmd,
strbuf_release();
 }
 
-__attribute__((format (printf,3,4)))
+__attribute__((format (printf,4,5)))
 static int odb_helper_start(struct odb_helper *o,
struct odb_helper_cmd *cmd,
+   int use_stdin,
const char *fmt, ...)
 {
va_list ap;
@@ -91,7 +92,10 @@ static int odb_helper_start(struct odb_helper *o,
 
cmd->child.argv = cmd->argv.argv;
cmd->child.use_shell = 1;
-   cmd->child.no_stdin = 1;
+   if (use_stdin)
+   cmd->child.in = -1;
+   else
+   cmd->child.no_stdin = 1;
cmd->child.out = -1;
 
if (start_command(>child) < 0) {
@@ -120,7 +124,7 @@ int odb_helper_init(struct odb_helper *o)
FILE *fh;
struct strbuf line = STRBUF_INIT;
 
-   if (odb_helper_start(o, , "init") < 0)
+   if (odb_helper_start(o, , 0, "init") < 0)
return -1;
 
fh = xfdopen(cmd.child.out, "r");
@@ -180,7 +184,7 @@ static void odb_helper_load_have(struct odb_helper *o)
return;
o->have_valid = 1;
 
-   if (odb_helper_start(o, , "have") < 0)
+   if (odb_helper_start(o, , 0, "have") < 0)
return;
 
fh = xfdopen(cmd.child.out, "r");
@@ -235,7 +239,7 @@ int odb_helper_get_object(struct odb_helper *o, const 
unsigned char *sha1,
if (!obj)
return -1;
 
-   if (odb_helper_start(o, , "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+   if (odb_helper_start(o, , 0, "get_git_obj %s", sha1_to_hex(sha1)) < 
0)
return -1;
 
memset(, 0, sizeof(stream));
@@ -335,3 +339,32 @@ int odb_helper_get_direct(struct odb_helper *o,
 
return res;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+ const void *buf, size_t len,
+ const char *type, unsigned char *sha1)
+{
+   struct odb_helper_cmd cmd;
+
+   if (odb_helper_start(o, , 1, "put_raw_obj %s %"PRIuMAX" %s",
+sha1_to_hex(sha1), (uintmax_t)len, type) < 0)
+   return -1;
+
+   do {
+   int w = xwrite(cmd.child.in, buf, len);
+   if (w < 0) {
+   error("unable to write to odb helper '%s': %s",
+ o->name, strerror(errno));
+   close(cmd.child.in);
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   return -1;
+   }
+   len -= w;
+   } while (len > 0);
+
+   close(cmd.child.in);
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index f8eac7f44c..4a9cc7f07b 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -46,5 +46,8 @@ extern int odb_helper_get_object(struct odb_helper *o,
 

[PATCH 10/40] external-odb: implement external_odb_get_direct

2018-01-03 Thread Christian Couder
This is implemented only in the promisor remote mode
for now by calling fetch_object().

Signed-off-by: Christian Couder 
---
 external-odb.c | 15 +++
 external-odb.h |  1 +
 odb-helper.c   | 13 +
 odb-helper.h   |  3 ++-
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index d26e63d8b1..5d0afb9762 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -76,3 +76,18 @@ int external_odb_has_object(const unsigned char *sha1)
return 1;
return 0;
 }
+
+int external_odb_get_direct(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   if (odb_helper_get_direct(o, sha1) < 0)
+   continue;
+   return 0;
+   }
+
+   return -1;
+}
diff --git a/external-odb.h b/external-odb.h
index 9a3c2f01b3..fd6708163e 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,5 +4,6 @@
 extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
+extern int external_odb_get_direct(const unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 1404393807..4b70b287af 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -4,6 +4,7 @@
 #include "odb-helper.h"
 #include "run-command.h"
 #include "sha1-lookup.h"
+#include "fetch-object.h"
 
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
@@ -52,3 +53,15 @@ int odb_helper_has_object(struct odb_helper *o, const 
unsigned char *sha1)
return !!odb_helper_lookup(o, sha1);
 }
 
+int odb_helper_get_direct(struct odb_helper *o,
+ const unsigned char *sha1)
+{
+   int res = 0;
+   uint64_t start = getnanotime();
+
+   fetch_object(o->dealer, sha1);
+
+   trace_performance_since(start, "odb_helper_get_direct");
+
+   return res;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 9395e606ce..f4bc66b0ef 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -20,5 +20,6 @@ struct odb_helper {
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
 extern int odb_helper_has_object(struct odb_helper *o,
 const unsigned char *sha1);
-
+extern int odb_helper_get_direct(struct odb_helper *o,
+const unsigned char *sha1);
 #endif /* ODB_HELPER_H */
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 36/40] Add t0470 to test passing raw objects

2018-01-03 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0470-read-object-http-e-odb.sh | 109 ++
 t/t0470/read-object-plain |  83 +
 2 files changed, 192 insertions(+)
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain

diff --git a/t/t0470-read-object-http-e-odb.sh 
b/t/t0470-read-object-http-e-odb.sh
new file mode 100755
index 00..774528c04f
--- /dev/null
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process passing plain objects to an 
HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0470"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain"
+
+test_expect_success 'setup repo with a root commit' '
+   test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+   git init other-repo &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+   git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME=123=blob"
+
+test_expect_success 'can upload a file' '
+   echo "Hello Apache World!" >hello_to_send.txt &&
+   echo "How are you?" >>hello_to_send.txt &&
+   curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" 
>out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+   curl --include "$LIST_URL" >out_list &&
+   grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+   curl --data "delete" --include "$UPLOAD_URL=1" >out_delete &&
+   curl --include "$LIST_URL" >out_list2 &&
+   ! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   echo "$hash1-4-blob" >expected &&
+   ls "$FILES_DIR" >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+   git cat-file blob "$hash1" &&
+   git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+git config odb.magic.subprocesscommand "$HELPER" &&
+git cat-file blob "$hash1" &&
+git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone .. . &&
+git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+test_must_fail git clone --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0470/read-object-plain b/t/t0470/read-object-plain
new file mode 100755
index 00..0766e16032
--- /dev/null
+++ b/t/t0470/read-object-plain
@@ -0,0 +1,83 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_raw_obj", 
"put_raw_obj");
+packet_check_and_write_capabilities(\%remote_caps, "get_raw_obj", 
"put_raw_obj");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+   my ($res, $command) = packet_txt_read();
+
+   if ( $res == -1 ) {
+   exit 0;
+   }
+
+   $command =~ s/^command=//;
+
+   if ( $command eq "init" ) {
+   packet_bin_read();
+
+   packet_txt_write("status=success");
+   packet_flush();
+   } elsif ( $command eq "get_raw_obj" ) {
+   my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+   packet_bin_read();
+
+   my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+   my $userAgent = LWP::UserAgent->new();
+
+   my $response = 

[PATCH 40/40] Add Documentation/technical/external-odb.txt

2018-01-03 Thread Christian Couder
This describes the external odb mechanism's purpose and
how it works.

Helped-by: Ben Peart 
Signed-off-by: Christian Couder 
---
 Documentation/technical/external-odb.txt | 342 +++
 1 file changed, 342 insertions(+)
 create mode 100644 Documentation/technical/external-odb.txt

diff --git a/Documentation/technical/external-odb.txt 
b/Documentation/technical/external-odb.txt
new file mode 100644
index 00..58ec8a8145
--- /dev/null
+++ b/Documentation/technical/external-odb.txt
@@ -0,0 +1,342 @@
+External ODBs
+^
+
+The External ODB mechanism makes it possible for Git objects, only
+blobs for now though, to be stored in an "external object database"
+(External ODB).
+
+An External ODB can be any object store as long as there is an helper
+program called an "odb helper" that can communicate with Git to
+transfer objects to/from the external odb and to retrieve information
+about available objects in the external odb.
+
+Purpose
+===
+
+The purpose of this mechanism is to make possible to handle Git
+objects, especially blobs, in much more flexible ways.
+
+Currently Git can store its objects only in the form of loose objects
+in separate files or packed objects in a pack file. These existing
+object stores cannot be easily optimized for many different kind of
+contents.
+
+So the current stores are not flexible enough for some important use
+cases like handling really big binary files or handling a really big
+number of files that are fetched only as needed. And it is not
+realistic to expect that Git could fully natively handle many of such
+use cases. Git would need to natively implement different internal
+stores which would be a huge burden and which could lead to
+re-implement things like HTTP servers, Docker registries or artifact
+stores that already exist outside Git.
+
+Furthermore many improvements that are dependent on specific setups
+could be implemented in the way Git objects are managed if it was
+possible to customize how the Git objects are handled. For example a
+restartable clone using the bundle mechanism has often been requested,
+but implementing that would go against the current strict rules under
+which the Git objects are currently handled.
+
+What Git needs is a mechanism to make it possible to customize in a
+lot of different ways how the Git objects are handled. Though this
+mechanism should try as much as possible to avoid interfering with the
+usual way in which Git handle its objects.
+
+Helpers
+===
+
+ODB helpers are commands that have to be registered using either the
+"odb..subprocessCommand" or the "odb..scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+ exists and that the registered command should be used to
+communicate with it.
+
+The communication happens through instructions that are sent by Git
+and that the commands should answer. If it makes sense, Git can send
+the same instruction to many commands in the order in which they are
+configured.
+
+There are 2 kinds of commands. Commands registered using the
+"odb..subprocessCommand" config variable are called "process
+commands" and the associated mode is called "process mode". Commands
+registered using the "odb..scriptCommand" config variables
+are called "script commands" and the associated mode is called "script
+mode".
+
+Early on git commands send an 'init' instruction to the registered
+commands. A capability negociation will take place during this
+request/response exchange which will let Git and the helpers know how
+they can further collaborate. The attribute system can also be used to
+tell Git which objects should be handled by which helper.
+
+Process Mode
+
+
+In process mode the command is started as a single process invocation
+that should last for the entire life of the single Git command that
+started it.
+
+A packet format (pkt-line, see technical/protocol-common.txt) based
+protocol over standard input and standard output is used for
+communication between Git and the helper command.
+
+After the process command is started, Git sends a welcome message
+("git-read-object-client"), a list of supported protocol version
+numbers, and a flush packet. Git expects to read a welcome response
+message ("git-read-object-server"), exactly one protocol version
+number from the previously sent list, and a flush packet. All further
+communication will be based on the selected version.
+
+The remaining protocol description below documents "version=1". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities
+that it supports and a flush packet. Git expects to read a list of
+desired capabilities, which must be a subset of the supported
+capabilities list, and a flush packet as 

[PATCH 38/40] Add t0480 to test "have" capability and raw objects

2018-01-03 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0480-read-object-have-http-e-odb.sh | 109 +
 t/t0480/read-object-plain-have | 103 +++
 2 files changed, 212 insertions(+)
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have

diff --git a/t/t0480-read-object-have-http-e-odb.sh 
b/t/t0480-read-object-have-http-e-odb.sh
new file mode 100755
index 00..056a40f2bb
--- /dev/null
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process with "have" cap and plain 
objects'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0480"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain-have"
+
+test_expect_success 'setup repo with a root commit' '
+   test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+   git init other-repo &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+   git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME=123=blob"
+
+test_expect_success 'can upload a file' '
+   echo "Hello Apache World!" >hello_to_send.txt &&
+   echo "How are you?" >>hello_to_send.txt &&
+   curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" 
>out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+   curl --include "$LIST_URL" >out_list &&
+   grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+   curl --data "delete" --include "$UPLOAD_URL=1" >out_delete &&
+   curl --include "$LIST_URL" >out_list2 &&
+   ! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   echo "$hash1-4-blob" >expected &&
+   ls "$FILES_DIR" >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+   git cat-file blob "$hash1" &&
+   git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+git config odb.magic.subprocessCommand "$HELPER" &&
+git cat-file blob "$hash1" &&
+git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone .. . &&
+git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+test_must_fail git clone --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0480/read-object-plain-have b/t/t0480/read-object-plain-have
new file mode 100755
index 00..f230cbd5eb
--- /dev/null
+++ b/t/t0480/read-object-plain-have
@@ -0,0 +1,103 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_raw_obj", 
"put_raw_obj", "have");
+packet_check_and_write_capabilities(\%remote_caps, "get_raw_obj", 
"put_raw_obj", "have");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+   my ($res, $command) = packet_txt_read();
+
+   if ( $res == -1 ) {
+   exit 0;
+   }
+
+   $command =~ s/^command=//;
+
+   if ( $command eq "init" ) {
+   packet_bin_read();
+
+   packet_txt_write("status=success");
+   packet_flush();
+   } elsif ( $command eq "have" ) {
+   # read the flush after the command
+   packet_bin_read();
+
+   my $have_url = $http_url . "/list/";
+
+   my $userAgent = LWP::UserAgent->new();
+   my $response 

[PATCH 37/40] odb-helper: add have_object_process()

2018-01-03 Thread Christian Couder
This adds the infrastructure to handle 'have' instructions in
process mode.

The answer from the helper sub-process should be like the
output in script mode, that is lines like this:

sha1 SPACE size SPACE type NEWLINE

Signed-off-by: Christian Couder 
---
 odb-helper.c | 76 ++--
 1 file changed, 74 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index d901f6d0bc..d8902a9541 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -635,6 +635,70 @@ static int odb_helper_object_cmp(const void *va, const 
void *vb)
return hashcmp(a->sha1, b->sha1);
 }
 
+static int send_have_packets(struct odb_helper *o,
+struct object_process *entry,
+struct strbuf *status)
+{
+   int packet_len;
+   int total_got = 0;
+   struct child_process *process = >subprocess.process;
+   int err = packet_write_fmt_gently(process->in, "command=have\n");
+
+   if (err)
+   return err;
+
+   err = packet_flush_gently(process->in);
+   if (err)
+   return err;
+
+   for (;;) {
+   /* packet_read() writes a '\0' extra byte at the end */
+   char buf[LARGE_PACKET_DATA_MAX + 1];
+   char *p = buf;
+   int more;
+
+   packet_len = packet_read(process->out, NULL, NULL,
+   buf, LARGE_PACKET_DATA_MAX + 1,
+   PACKET_READ_GENTLE_ON_EOF);
+
+   if (packet_len <= 0)
+   break;
+
+   total_got += packet_len;
+
+   /* 'have' packets should end with '\n' or '\0' */
+   do {
+   char *eol = strchrnul(p, '\n');
+   more = (*eol == '\n');
+   *eol = '\0';
+   if (add_have_entry(o, p))
+   break;
+   p = eol + 1;
+   } while (more && *p);
+   }
+
+   if (packet_len < 0)
+   return packet_len;
+
+   return check_object_process_status(process->out, status);
+}
+
+static int have_object_process(struct odb_helper *o)
+{
+   int err;
+   struct object_process *entry;
+   struct strbuf status = STRBUF_INIT;
+
+   entry = launch_object_process(o, ODB_HELPER_CAP_HAVE);
+   if (!entry)
+   return -1;
+
+   err = send_have_packets(o, entry, );
+
+   return check_object_process_error(err, status.buf, entry, o->dealer,
+ ODB_HELPER_CAP_HAVE);
+}
+
 static void have_object_script(struct odb_helper *o)
 {
struct odb_helper_cmd cmd;
@@ -656,12 +720,20 @@ static void have_object_script(struct odb_helper *o)
 
 static void odb_helper_load_have(struct odb_helper *o)
 {
+   uint64_t start;
+
if (o->have_valid)
return;
o->have_valid = 1;
 
+   start = getnanotime();
+
if (o->type == ODB_HELPER_SCRIPT_CMD)
have_object_script(o);
+   else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+   have_object_process(o);
+
+   trace_performance_since(start, "odb_helper_load_have");
 
qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
@@ -923,7 +995,7 @@ int odb_helper_get_direct(struct odb_helper *o,
fetch_object(o->dealer, sha1);
else if (o->type == ODB_HELPER_SCRIPT_CMD)
res = get_direct_script(o, sha1);
-   else
+   else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
res = get_object_process(o, sha1, -1);
 
trace_performance_since(start, "odb_helper_get_direct");
@@ -993,7 +1065,7 @@ int odb_helper_put_object(struct odb_helper *o,
  const void *buf, size_t len,
  const char *type, unsigned char *sha1)
 {
-   int res;
+   int res = 0;
uint64_t start = getnanotime();
 
if (o->type == ODB_HELPER_SCRIPT_CMD)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 35/40] odb-helper: add put_object_process()

2018-01-03 Thread Christian Couder
This adds the infrastructure to send objects to a sub-process
handling the communication with an external odb.

For now we only handle sending raw blobs using the 'put_raw_obj'
instruction.

Signed-off-by: Christian Couder 
---
 odb-helper.c | 75 +---
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index a67dfddca0..d901f6d0bc 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -434,6 +434,58 @@ static int get_object_process(struct odb_helper *o, const 
unsigned char *sha1, i
  o->dealer, cur_cap);
 }
 
+static int send_put_packets(struct object_process *entry,
+   const unsigned char *sha1,
+   const void *buf,
+   size_t len,
+   struct strbuf *status)
+{
+   struct child_process *process = >subprocess.process;
+   int err = packet_write_fmt_gently(process->in, "command=put_raw_obj\n");
+   if (err)
+   return err;
+
+   err = packet_write_fmt_gently(process->in, "sha1=%s\n", 
sha1_to_hex(sha1));
+   if (err)
+   return err;
+
+   err = packet_write_fmt_gently(process->in, "size=%"PRIuMAX"\n", len);
+   if (err)
+   return err;
+
+   err = packet_write_fmt_gently(process->in, "kind=blob\n");
+   if (err)
+   return err;
+
+   err = packet_flush_gently(process->in);
+   if (err)
+   return err;
+
+   err = write_packetized_from_buf(buf, len, process->in);
+   if (err)
+   return err;
+
+   return check_object_process_status(process->out, status);
+}
+
+static int put_object_process(struct odb_helper *o,
+ const void *buf, size_t len,
+ const char *type, unsigned char *sha1)
+{
+   int err;
+   struct object_process *entry;
+   struct strbuf status = STRBUF_INIT;
+
+   entry = launch_object_process(o, ODB_HELPER_CAP_PUT_RAW_OBJ);
+   if (!entry)
+   return -1;
+
+   err = send_put_packets(entry, sha1, buf, len, );
+
+   return check_object_process_error(err, status.buf, entry, o->dealer,
+ ODB_HELPER_CAP_PUT_RAW_OBJ);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
struct odb_helper *o;
@@ -908,9 +960,9 @@ int odb_helper_get_object(struct odb_helper *o,
return res;
 }
 
-int odb_helper_put_object(struct odb_helper *o,
- const void *buf, size_t len,
- const char *type, unsigned char *sha1)
+static int put_raw_object_script(struct odb_helper *o,
+const void *buf, size_t len,
+const char *type, unsigned char *sha1)
 {
struct odb_helper_cmd cmd;
 
@@ -936,3 +988,20 @@ int odb_helper_put_object(struct odb_helper *o,
odb_helper_finish(o, );
return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+ const void *buf, size_t len,
+ const char *type, unsigned char *sha1)
+{
+   int res;
+   uint64_t start = getnanotime();
+
+   if (o->type == ODB_HELPER_SCRIPT_CMD)
+   res = put_raw_object_script(o, buf, len, type, sha1);
+   else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+   res = put_object_process(o, buf, len, type, sha1);
+
+   trace_performance_since(start, "odb_helper_put_object");
+
+   return res;
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 34/40] Add t0460 to test passing git objects

2018-01-03 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0460-read-object-git.sh | 28 +
 t/t0460/read-object-git| 78 ++
 2 files changed, 106 insertions(+)
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git

diff --git a/t/t0460-read-object-git.sh b/t/t0460-read-object-git.sh
new file mode 100755
index 00..2873b445f3
--- /dev/null
+++ b/t/t0460-read-object-git.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process passing git 
objects'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0460"
+
+test_expect_success 'setup host repo with a root commit' '
+   test_commit zero &&
+   hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object-git"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+   git init guest-repo &&
+   (cd guest-repo &&
+git config odb.magic.subprocessCommand "$HELPER" &&
+git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+   cd guest-repo &&
+   test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0460/read-object-git b/t/t0460/read-object-git
new file mode 100755
index 00..4b3ca0948b
--- /dev/null
+++ b/t/t0460/read-object-git
@@ -0,0 +1,78 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#  cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard 
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_git_obj");
+packet_check_and_write_capabilities(\%remote_caps, "get_git_obj");
+
+while (1) {
+   my ($res, $command) = packet_txt_read();
+
+   if ( $res == -1 ) {
+   exit 0;
+   }
+
+   $command =~ s/^command=//;
+
+   if ( $command eq "init" ) {
+   packet_bin_read();
+
+   packet_txt_write("status=success");
+   packet_flush();
+   } elsif ( $command eq "get_git_obj" ) {
+   my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+   packet_bin_read();
+
+   my $path = $sha1;
+   $path =~ s{..}{$&/};
+   $path = $DIR . "/objects/" . $path;
+
+   my $contents = do {
+   local $/;
+   open my $fh, $path or die "Can't open '$path': $!";
+   <$fh>
+   };
+
+   packet_bin_write($contents);
+   packet_flush();
+   packet_txt_write("status=success");
+   packet_flush();
+   } else {
+   die "bad command '$command'";
+   }
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 33/40] Add t0450 to test 'get_direct' mechanism

2018-01-03 Thread Christian Couder
From: Ben Peart 

Signed-off-by: Ben Peart 
Signed-off-by: Christian Couder 
---
 t/t0450-read-object.sh | 28 +
 t/t0450/read-object| 68 ++
 2 files changed, 96 insertions(+)
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object

diff --git a/t/t0450-read-object.sh b/t/t0450-read-object.sh
new file mode 100755
index 00..6b97305452
--- /dev/null
+++ b/t/t0450-read-object.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0450"
+
+test_expect_success 'setup host repo with a root commit' '
+   test_commit zero &&
+   hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+   git init guest-repo &&
+   (cd guest-repo &&
+git config odb.magic.subprocessCommand "$HELPER" &&
+git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+   cd guest-repo &&
+   test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0450/read-object b/t/t0450/read-object
new file mode 100755
index 00..004e9368c9
--- /dev/null
+++ b/t/t0450/read-object
@@ -0,0 +1,68 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#  cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard 
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_direct");
+packet_check_and_write_capabilities(\%remote_caps, "get_direct");
+
+while (1) {
+   my ($res, $command) = packet_txt_read();
+
+   if ( $res == -1 ) {
+   exit 0;
+   }
+
+   $command =~ s/^command=//;
+
+   if ( $command eq "init" ) {
+   packet_bin_read();
+
+   packet_txt_write("status=success");
+   packet_flush();
+   } elsif ( $command eq "get_direct" ) {
+   my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+   packet_bin_read();
+
+   system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . 
' | GIT_NO_EXTERNAL_ODB=1 git hash-object -w --stdin >/dev/null 2>&1');
+
+   packet_txt_write(($?) ? "status=error" : "status=success");
+   packet_flush();
+   } else {
+   die "bad command '$command'";
+   }
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 26/40] lib-httpd: add apache-e-odb.conf

2018-01-03 Thread Christian Couder
This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd/apache-e-odb.conf | 214 ++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 00..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+
+   LoadModule log_config_module modules/mod_log_config.so
+
+
+   LoadModule alias_module modules/mod_alias.so
+
+
+   LoadModule cgi_module modules/mod_cgi.so
+
+
+   LoadModule env_module modules/mod_env.so
+
+
+   LoadModule rewrite_module modules/mod_rewrite.so
+
+
+   LoadModule version_module modules/mod_version.so
+
+
+   LoadModule headers_module modules/mod_headers.so
+
+
+
+LockFile accept.lock
+
+
+
+
+   LoadModule auth_module modules/mod_auth.so
+
+
+
+= 2.1>
+
+   LoadModule auth_basic_module modules/mod_auth_basic.so
+
+
+   LoadModule authn_file_module modules/mod_authn_file.so
+
+
+   LoadModule authz_user_module modules/mod_authz_user.so
+
+
+   LoadModule authz_host_module modules/mod_authz_host.so
+
+
+
+= 2.4>
+
+   LoadModule authn_core_module modules/mod_authn_core.so
+
+
+   LoadModule authz_core_module modules/mod_authz_core.so
+
+
+   LoadModule access_compat_module modules/mod_access_compat.so
+
+
+   LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+
+
+   LoadModule unixd_module modules/mod_unixd.so
+
+
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_COMMITTER_NAME "Custom User"
+   SetEnv GIT_COMMITTER_EMAIL cus...@example.com
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_NAMESPACE ns
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   Header set Set-Cookie name=value
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+
+   Options FollowSymlinks
+
+
+  Options ExecCGI
+
+
+  Options ExecCGI
+
+
+   Options ExecCGI
+
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 
[R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand , so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+
+
+
+   LoadModule dav_module modules/mod_dav.so
+   LoadModule dav_fs_module 

[PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper'

2018-01-03 Thread Christian Couder
to prepare for having a long running odb helper sub-process
handling the communication between Git and an external odb.

We introduce "odb..subprocesscommand" to make it
possible to define such a sub-process, and we mark such odb
helpers with the new 'script_mode' field set to 0.

Helpers defined using the existing "odb..scriptcommand"
are marked with the 'script_mode' field set to 1.

Implementation of the different capabilities/instructions in
the new (sub-)process mode is left for following commits.

Signed-off-by: Christian Couder 
---
 external-odb.c |  4 
 odb-helper.c   | 19 ++-
 odb-helper.h   |  2 +-
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 3ce3d111f3..f38c2c2fe3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -43,6 +43,10 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
o->type = ODB_HELPER_SCRIPT_CMD;
return git_config_string(>dealer, var, value);
}
+   if (!strcmp(subkey, "subprocesscommand")) {
+   o->type = ODB_HELPER_SUBPROCESS_CMD;
+   return git_config_string(>dealer, var, value);
+   }
 
return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index 0fa7af0348..91b4de1a05 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -124,6 +124,9 @@ int odb_helper_init(struct odb_helper *o)
FILE *fh;
struct strbuf line = STRBUF_INIT;
 
+   if (o->type != ODB_HELPER_SCRIPT_CMD)
+   return 0;
+
if (odb_helper_start(o, , 0, "init") < 0)
return -1;
 
@@ -174,16 +177,12 @@ static int odb_helper_object_cmp(const void *va, const 
void *vb)
return hashcmp(a->sha1, b->sha1);
 }
 
-static void odb_helper_load_have(struct odb_helper *o)
+static void have_object_script(struct odb_helper *o)
 {
struct odb_helper_cmd cmd;
FILE *fh;
struct strbuf line = STRBUF_INIT;
 
-   if (o->have_valid)
-   return;
-   o->have_valid = 1;
-
if (odb_helper_start(o, , 0, "have") < 0)
return;
 
@@ -195,6 +194,16 @@ static void odb_helper_load_have(struct odb_helper *o)
strbuf_release();
fclose(fh);
odb_helper_finish(o, );
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+   if (o->have_valid)
+   return;
+   o->have_valid = 1;
+
+   if (o->type == ODB_HELPER_SCRIPT_CMD)
+   have_object_script(o);
 
qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
diff --git a/odb-helper.h b/odb-helper.h
index 4a9cc7f07b..9a33adbf0b 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -7,7 +7,7 @@ enum odb_helper_type {
ODB_HELPER_NONE = 0,
ODB_HELPER_GIT_REMOTE,
ODB_HELPER_SCRIPT_CMD,
-   ODB_HELPER_PROCESS_CMD,
+   ODB_HELPER_SUBPROCESS_CMD,
OBJ_HELPER_MAX
 };
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 23/40] lib-httpd: pass config file to start_httpd()

2018-01-03 Thread Christian Couder
This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+   APACHE_CONF_FILE=${1-apache.conf}
+
prepare_httpd >&3 2>&4
 
trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
>&3 2>&4
if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
trap 'die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 09/40] introduce fetch-object: fetch one promisor object

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 Documentation/gitremote-helpers.txt |  7 ++
 Makefile|  1 +
 builtin/fetch-pack.c|  8 +++
 builtin/index-pack.c| 14 ---
 fetch-object.c  | 28 ++
 fetch-object.h  |  6 +
 fetch-pack.c| 48 +
 fetch-pack.h|  8 +++
 remote-curl.c   | 14 ++-
 transport.c |  8 +++
 transport.h | 11 +
 11 files changed, 128 insertions(+), 25 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 4a584f3c5d..4b8c93ec59 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -466,6 +466,13 @@ set by Git if the remote helper has the 'option' 
capability.
Transmit  as a push option. As the push option
must not contain LF or NUL characters, the string is not encoded.
 
+'option from-promisor' {'true'|'false'}::
+   Indicate that these objects are being fetched from a promisor.
+
+'option no-dependents' {'true'|'false'}::
+   Indicate that only the objects wanted need to be fetched, not
+   their dependents.
+
 SEE ALSO
 
 linkgit:git-remote[1]
diff --git a/Makefile b/Makefile
index 07694185c9..25b878279e 100644
--- a/Makefile
+++ b/Makefile
@@ -800,6 +800,7 @@ LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += external-odb.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f9..02abe7211e 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.update_shallow = 1;
continue;
}
+   if (!strcmp("--from-promisor", arg)) {
+   args.from_promisor = 1;
+   continue;
+   }
+   if (!strcmp("--no-dependents", arg)) {
+   args.no_dependents = 1;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 47515d8977..9dffaf20ae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, 
const char *msg,
if (close(fd) != 0)
die_errno(_("cannot close written %s file '%s'"),
  suffix, filename);
-   *report = suffix;
+   if (report)
+   *report = suffix;
}
strbuf_release(_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_msg, unsigned char *sha1)
+ const char *keep_msg, const char *promisor_msg,
+ unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
if (keep_msg)
write_special_file("keep", keep_msg, final_pack_name, sha1,
   );
+   if (promisor_msg)
+   write_special_file("promisor", promisor_msg, 

[PATCH 11/40] sha1_file: support lazily fetching missing objects

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/cat-file.c   |  3 +++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 28 ++
 t/t0410-partial-clone.sh | 51 
 8 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd75a..1e4edd81a0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "sha1-array.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 struct batch_options {
int enabled;
@@ -475,6 +476,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, , 0);
for_each_packed_object(batch_packed_object, , 0);
+   if (has_external_odb())
+   warning("This repository uses an odb. Some objects may 
not be loaded.");
 
cb.opt = opt;
cb.expand = 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe7211e..15eeed7b17 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index a6fa6d6482..7a8a679d4f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9dffaf20ae..54c921fa71 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+

[PATCH 28/40] pack-objects: don't pack objects in external odbs

2018-01-03 Thread Christian Couder
Objects managed by an external ODB should not be put into
pack files. They should be transfered using other mechanism
that can be specific to the external odb.

Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6c71552cdf..4ed66c7677 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -28,6 +28,7 @@
 #include "argv-array.h"
 #include "mru.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -1026,6 +1027,9 @@ static int want_object_in_pack(const struct object_id 
*oid,
return want;
}
 
+   if (external_odb_has_object(oid->hash))
+   return 0;
+
for (entry = packed_git_mru.head; entry; entry = entry->next) {
struct packed_git *p = entry->item;
off_t offset;
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 30/40] external-odb: add 'get_direct' support

2018-01-03 Thread Christian Couder
This implements the 'get_direct' capability/instruction that makes
it possible for external odb helper scripts to pass blobs to Git
by directly writing them as loose objects files.

It is better to call this a "direct" mode rather than a "fault-in"
mode as we could have the same kind of mechanism to "put" objects
into an external odb, where the odb helper would access blobs it
wants to send to an external odb directly from files, but it
would be strange to call that a fault-in mode too.

Signed-off-by: Christian Couder 
---
 external-odb.c |  3 ++-
 odb-helper.c   | 28 +++-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 93971e9ce4..3ce3d111f3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -109,7 +109,8 @@ int external_odb_get_object(const unsigned char *sha1)
int ret;
int fd;
 
-   if (!odb_helper_has_object(o, sha1))
+   if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) &&
+   !(o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ))
continue;
 
fd = create_object_tmpfile(, path);
diff --git a/odb-helper.c b/odb-helper.c
index fc30c2fa57..0fa7af0348 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -429,24 +429,42 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 int odb_helper_get_direct(struct odb_helper *o,
  const unsigned char *sha1)
 {
-   int res = 0;
uint64_t start = getnanotime();
 
-   fetch_object(o->dealer, sha1);
+   if (o->type == ODB_HELPER_GIT_REMOTE) {
+   fetch_object(o->dealer, sha1);
+   } else {
+   struct odb_helper_object *obj;
+   struct odb_helper_cmd cmd;
+
+   obj = odb_helper_lookup(o, sha1);
+   if (!obj)
+   return -1;
+
+   if (odb_helper_start(o, , 0, "get_direct %s", 
sha1_to_hex(sha1)) < 0)
+   return -1;
+
+   if (odb_helper_finish(o, ))
+   return -1;
+   }
 
trace_performance_since(start, "odb_helper_get_direct");
 
-   return res;
+   return 0;
 }
 
 int odb_helper_get_object(struct odb_helper *o,
  const unsigned char *sha1,
  int fd)
 {
+   if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+   return odb_helper_get_git_object(o, sha1, fd);
if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
return odb_helper_get_raw_object(o, sha1, fd);
-   else
-   return odb_helper_get_git_object(o, sha1, fd);
+   if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+   return 0;
+
+   BUG("invalid get capability (capabilities: '%d')", 
o->supported_capabilities);
 }
 
 int odb_helper_put_object(struct odb_helper *o,
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 24/40] lib-httpd: add upload.sh

2018-01-03 Thread Christian Couder
This cgi will be used to upload objects to, or to delete
objects from, an apache web server.

This way the apache server can work as an external object
database.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh|  1 +
 t/lib-httpd/upload.sh | 45 +
 2 files changed, 46 insertions(+)
 create mode 100644 t/lib-httpd/upload.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 2e659a8ee2..d80b004549 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script upload.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/upload.sh b/t/lib-httpd/upload.sh
new file mode 100644
index 00..64d3f31c31
--- /dev/null
+++ b/t/lib-httpd/upload.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# In part from 
http://codereview.stackexchange.com/questions/79549/bash-cgi-upload-file
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+   key=${1%%=*}
+   val=${1#*=}
+
+   case "$key" in
+   "sha1") sha1="$val" ;;
+   "type") type="$val" ;;
+   "size") size="$val" ;;
+   "delete") delete=1 ;;
+   *) echo >&2 "unknown key '$key'" ;;
+   esac
+
+   shift
+done
+
+case "$REQUEST_METHOD" in
+POST)
+   if test "$delete" = "1"
+   then
+   rm -f "$FILES_DIR/$sha1-$size-$type"
+   else
+   mkdir -p "$FILES_DIR"
+   cat >"$FILES_DIR/$sha1-$size-$type"
+   fi
+
+   echo 'Status: 204 No Content'
+   echo
+   ;;
+
+*)
+   echo 'Status: 405 Method Not Allowed'
+   echo
+esac
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 25/40] lib-httpd: add list.sh

2018-01-03 Thread Christian Couder
This cgi script can list Git objects that have been uploaded as
files to an apache web server. This script can also retrieve
the content of each of these files.

This will help make apache work as an external object database.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh  |  1 +
 t/lib-httpd/list.sh | 41 +
 2 files changed, 42 insertions(+)
 create mode 100644 t/lib-httpd/list.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index d80b004549..f31ea261f5 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -133,6 +133,7 @@ prepare_httpd() {
install_script broken-smart-http.sh
install_script error.sh
install_script upload.sh
+   install_script list.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/list.sh b/t/lib-httpd/list.sh
new file mode 100644
index 00..b6d6c29a2f
--- /dev/null
+++ b/t/lib-httpd/list.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+   key=${1%%=*}
+   val=${1#*=}
+
+   case "$key" in
+   "sha1") sha1="$val" ;;
+   *) echo >&2 "unknown key '$key'" ;;
+   esac
+
+   shift
+done
+
+if test -d "$FILES_DIR"
+then
+   if test -z "$sha1"
+   then
+   echo 'Status: 200 OK'
+   echo
+   ls "$FILES_DIR" | tr '-' ' '
+   else
+   if test -f "$FILES_DIR/$sha1"-*
+   then
+   echo 'Status: 200 OK'
+   echo
+   cat "$FILES_DIR/$sha1"-*
+   else
+   echo 'Status: 404 Not Found'
+   echo
+   fi
+   fi
+fi
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 12/40] rev-list: support termination at promisor objects

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
Signed-off-by: Junio C Hamano 
---
 Documentation/rev-list-options.txt |  11 
 builtin/rev-list.c |  69 ++---
 list-objects.c |  29 ++-
 object.c   |   2 +-
 revision.c |  33 +++-
 revision.h |   5 +-
 t/t0410-partial-clone.sh   | 101 +
 7 files changed, 239 insertions(+), 11 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 22f5c9b43d..7b273635de 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -750,10 +750,21 @@ The form '--missing=allow-any' will allow object 
traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+   (For internal use only.)  Prefilter object traversal at
+   promisor boundary.  This is used with partial clone.  This is
+   stronger than `--missing=allow-promisor` because it limits the
+   traversal, rather than just silencing errors about missing
+   objects.
+
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index d5345b6a2e..e27aa1fc07 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
MA_ERROR = 0,/* fail if any missing objects are encountered */
MA_ALLOW_ANY,/* silently allow ALL missing objects */
MA_PRINT,/* print ALL missing objects in special section */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void 
*data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+   /*
+* Whether or not we try to dynamically fetch missing objects
+* from the server, we currently DO NOT have the object.  We
+* can either print, allow (ignore), or conditionally allow
+* (ignore) them.
+*/
switch (arg_missing_action) {
case MA_ERROR:
die("missing blob object '%s'", oid_to_hex(>oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
oidset_insert(_objects, >oid);
return;
 
+   case MA_ALLOW_PROMISOR:
+   if (is_promisor_object(>oid))
+   return;
+   die("unexpected missing blob object '%s'",
+   oid_to_hex(>oid));
+   return;
+
default:
BUG("unhandled missing_action");
return;
}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
finish_object__ma(obj);
+   return 1;
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 

[PATCH 08/40] index-pack: refactor writing of .keep files

2018-01-03 Thread Christian Couder
From: Jonathan Tan 

In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan 
Signed-off-by: Junio C Hamano 
---
 builtin/index-pack.c | 99 
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4c51aec81f..47515d8977 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+  struct strbuf *buf)
+{
+   size_t len;
+   if (!strip_suffix(pack_name, ".pack", ))
+   die(_("packfile name '%s' does not end with '.pack'"),
+   pack_name);
+   strbuf_add(buf, pack_name, len);
+   strbuf_addch(buf, '.');
+   strbuf_addstr(buf, suffix);
+   return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+  const char *pack_name, const unsigned char *sha1,
+  const char **report)
+{
+   struct strbuf name_buf = STRBUF_INIT;
+   const char *filename;
+   int fd;
+   int msg_len = strlen(msg);
+
+   if (pack_name)
+   filename = derive_filename(pack_name, suffix, _buf);
+   else
+   filename = odb_pack_name(_buf, sha1, suffix);
+
+   fd = odb_pack_keep(filename);
+   if (fd < 0) {
+   if (errno != EEXIST)
+   die_errno(_("cannot write %s file '%s'"),
+ suffix, filename);
+   } else {
+   if (msg_len > 0) {
+   write_or_die(fd, msg, msg_len);
+   write_or_die(fd, "\n", 1);
+   }
+   if (close(fd) != 0)
+   die_errno(_("cannot close written %s file '%s'"),
+ suffix, filename);
+   *report = suffix;
+   }
+   strbuf_release(_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_name, const char *keep_msg,
- unsigned char *sha1)
+ const char *keep_msg, unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
struct strbuf index_name = STRBUF_INIT;
-   struct strbuf keep_name_buf = STRBUF_INIT;
int err;
 
if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const 
char *curr_pack_name,
die_errno(_("error while closing pack file"));
}
 
-   if (keep_msg) {
-   int keep_fd, keep_msg_len = strlen(keep_msg);
-
-   if (!keep_name)
-   keep_name = odb_pack_name(_name_buf, sha1, "keep");
-
-   keep_fd = odb_pack_keep(keep_name);
-   if (keep_fd < 0) {
-   if (errno != EEXIST)
-   die_errno(_("cannot write keep file '%s'"),
- keep_name);
-   } else {
-   if (keep_msg_len > 0) {
-   write_or_die(keep_fd, keep_msg, keep_msg_len);
-   write_or_die(keep_fd, "\n", 1);
-   }
-   if (close(keep_fd) != 0)
-   die_errno(_("cannot close written keep file 
'%s'"),
- keep_name);
-   report = "keep";
-   }
-   }
+   if (keep_msg)
+   write_special_file("keep", keep_msg, final_pack_name, sha1,
+  );
 
if (final_pack_name != curr_pack_name) {
if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
 
strbuf_release(_name);
strbuf_release(_name);
-   strbuf_release(_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-  struct strbuf *buf)
-{
-   size_t len;
-   if (!strip_suffix(pack_name, ".pack", ))
-   die(_("packfile name '%s' does not end with '.pack'"),
-   pack_name);
-   strbuf_add(buf, pack_name, 

[PATCH 32/40] odb-helper: add init_object_process()

2018-01-03 Thread Christian Couder
This adds the infrastructure to launch and use long running
sub-processes as external odb helpers.

For now only the 'init' and 'get_direct' capabilities are
supported with sub-processes.

Signed-off-by: Christian Couder 
---
 external-odb.c |  46 +++---
 odb-helper.c   | 474 ++---
 sha1_file.c|  12 +-
 3 files changed, 491 insertions(+), 41 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index f38c2c2fe3..50c1cec50b 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -81,29 +81,11 @@ const char *external_odb_root(void)
return root;
 }
 
-int external_odb_has_object(const unsigned char *sha1)
-{
-   struct odb_helper *o;
-
-   external_odb_init();
-
-   for (o = helpers; o; o = o->next) {
-   if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
-   return 1;
-   if (odb_helper_has_object(o, sha1))
-   return 1;
-   }
-   return 0;
-}
-
-int external_odb_get_object(const unsigned char *sha1)
+static int external_odb_do_get_object(const unsigned char *sha1)
 {
struct odb_helper *o;
const char *path;
 
-   if (!external_odb_has_object(sha1))
-   return -1;
-
path = sha1_file_name_alt(external_odb_root(), sha1);
safe_create_leading_directories_const(path);
prepare_external_alt_odb();
@@ -157,6 +139,32 @@ int external_odb_get_direct(const unsigned char *sha1)
return -1;
 }
 
+int external_odb_has_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE)) {
+   if (o->supported_capabilities & 
ODB_HELPER_CAP_GET_DIRECT)
+   return 1;
+   return !external_odb_do_get_object(sha1);
+   }
+   if (odb_helper_has_object(o, sha1))
+   return 1;
+   }
+   return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   return external_odb_do_get_object(sha1);
+}
+
 int external_odb_put_object(const void *buf, size_t len,
const char *type, unsigned char *sha1)
 {
diff --git a/odb-helper.c b/odb-helper.c
index 91b4de1a05..a67dfddca0 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -5,6 +5,22 @@
 #include "run-command.h"
 #include "sha1-lookup.h"
 #include "fetch-object.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "sigchain.h"
+
+struct object_process {
+   struct subprocess_entry subprocess;
+   unsigned int supported_capabilities;
+};
+
+static struct hashmap subprocess_map;
+
+static int check_object_process_status(int fd, struct strbuf *status)
+{
+   subprocess_read_status(fd, status);
+   return strcmp(status->buf, "success");
+}
 
 static void parse_capabilities(char *cap_buf,
   unsigned int *supported_capabilities,
@@ -40,6 +56,384 @@ static void parse_capabilities(char *cap_buf,
string_list_clear(_list, 0);
 }
 
+static int start_object_process_fn(struct subprocess_entry *subprocess)
+{
+   static int versions[] = {1, 0};
+   static struct subprocess_capability capabilities[] = {
+   { "get_git_obj", ODB_HELPER_CAP_GET_GIT_OBJ },
+   { "get_raw_obj", ODB_HELPER_CAP_GET_RAW_OBJ },
+   { "get_direct",  ODB_HELPER_CAP_GET_DIRECT  },
+   { "put_git_obj", ODB_HELPER_CAP_PUT_GIT_OBJ },
+   { "put_raw_obj", ODB_HELPER_CAP_PUT_RAW_OBJ },
+   { "put_direct",  ODB_HELPER_CAP_PUT_DIRECT  },
+   { "have",ODB_HELPER_CAP_HAVE },
+   { NULL, 0 }
+   };
+   struct object_process *entry = (struct object_process *)subprocess;
+   return subprocess_handshake(subprocess, "git-read-object", versions, 
NULL,
+   capabilities,
+   >supported_capabilities);
+}
+
+static struct object_process *launch_object_process(struct odb_helper *o,
+   unsigned int capability)
+{
+   struct object_process *entry = NULL;
+
+   if (!subprocess_map.tablesize)
+   hashmap_init(_map, (hashmap_cmp_fn) cmd2process_cmp, 
NULL, 0);
+   else
+   entry = (struct object_process 
*)subprocess_find_entry(_map, o->dealer);
+
+   fflush(NULL);
+
+   if (!entry) {
+   entry = xmalloc(sizeof(*entry));
+   entry->supported_capabilities = 0;
+
+   if (subprocess_start(_map, >subprocess, 
o->dealer, start_object_process_fn)) {
+   error("Could not launch process for cmd '%s'", 
o->dealer);
+   free(entry);
+ 

[PATCH 29/40] Add t0420 to test transfer to HTTP external odb

2018-01-03 Thread Christian Couder
This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder 
---
 t/t0420-transfer-http-e-odb.sh | 142 +
 1 file changed, 142 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 00..f84fe950ec
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+init)
+   echo "capability=get_raw_obj"
+   echo "capability=put_raw_obj"
+   echo "capability=have"
+   ;;
+have)
+   list_url="$HTTPD_URL/list/"
+   curl "$list_url" ||
+   die "curl '$list_url' failed"
+   ;;
+get_raw_obj)
+   get_url="$HTTPD_URL/list/?sha1=$2"
+   curl "$get_url" ||
+   die "curl '$get_url' failed"
+   ;;
+put_raw_obj)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   upload_url="$HTTPD_URL/upload/?sha1=$sha1=$size=$kind"
+   curl --data-binary @- --include "$upload_url" >out ||
+   die "curl '$upload_url' failed"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git 
hash-object -w -t blob --stdin) || exit
+   git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+test_expect_success 'setup repo with a root commit and the helper' '
+   test_commit zero &&
+   git config odb.magic.scriptCommand "$HELPER"
+'
+
+test_expect_success 'setup another repo from the first one' '
+   git init other-repo &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME=123=blob"
+
+test_expect_success 'can upload a file' '
+   echo "Hello Apache World!" >hello_to_send.txt &&
+   echo "How are you?" >>hello_to_send.txt &&
+   curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" 
>out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+   curl --include "$LIST_URL" >out_list &&
+   grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+   curl --data "delete" --include "$UPLOAD_URL=1" >out_delete &&
+   curl --include "$LIST_URL" >out_list2 &&
+   ! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   echo "$hash1-4-blob" >expected &&
+   ls "$FILES_DIR" >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+   git cat-file blob "$hash1" &&
+   git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+git config odb.magic.scriptCommand "$HELPER" &&
+git cat-file blob "$hash1" &&
+git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone .. . &&
+git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+test_must_fail git clone --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+git clone -c odb.magic.scriptCommand="$HELPER" \
+   --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script

2018-01-03 Thread Christian Couder
To properly test passing objects from Git to an external odb
we need an odb-helper script that supports a 'put'
capability/instruction.

For now we will support only sending raw blobs, so the
supported capability/instruction will be 'put_raw_obj'.

While at it let's add a test to check that our odb-helper
script works well.

Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 24 
 1 file changed, 24 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 977fea852d..4ccca1e965 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,10 +7,15 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 init)
echo "capability=get_git_obj"
+   echo "capability=put_raw_obj"
echo "capability=have"
;;
 have)
@@ -20,6 +25,16 @@ have)
 get_git_obj)
cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
;;
+put_raw_obj)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   written=$(git hash-object -w -t "$kind" --stdin)
+   test "$written" = "$sha1" || die "bad sha1 passed '$sha1' vs written 
'$written'"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
@@ -45,4 +60,13 @@ test_expect_success 'helper can retrieve alt objects' '
test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+   hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+   test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+   size=$(git cat-file -s "$hash") &&
+   git cat-file blob "$hash" | ./odb-helper put_raw_obj "$hash" "$size" 
blob &&
+   alt_size=$(git -C alt-repo cat-file -s "$hash") &&
+   test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



[PATCH 27/40] odb-helper: add odb_helper_get_raw_object()

2018-01-03 Thread Christian Couder
The existing odb_helper_get_object() is renamed
odb_helper_get_git_object() and a new odb_helper_get_raw_object()
is introduced to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder 
---
 odb-helper.c | 113 +--
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 6f56f07b38..fc30c2fa57 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -222,8 +222,107 @@ int odb_helper_has_object(struct odb_helper *o, const 
unsigned char *sha1)
return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
-   int fd)
+static int odb_helper_get_raw_object(struct odb_helper *o,
+const unsigned char *sha1,
+int fd)
+{
+   struct odb_helper_object *obj;
+   struct odb_helper_cmd cmd;
+   unsigned long total_got = 0;
+
+   char hdr[32];
+   int hdrlen;
+
+   int ret = Z_STREAM_END;
+   unsigned char compressed[4096];
+   git_zstream stream;
+   git_SHA_CTX hash;
+   unsigned char real_sha1[20];
+
+   obj = odb_helper_lookup(o, sha1);
+   if (!obj)
+   return -1;
+
+   if (odb_helper_start(o, , 0, "get_raw_obj %s", sha1_to_hex(sha1)) < 
0)
+   return -1;
+
+   /* Set it up */
+   git_deflate_init(, zlib_compression_level);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   git_SHA1_Init();
+
+   /* First header.. */
+   hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), 
obj->size) + 1;
+   stream.next_in = (unsigned char *)hdr;
+   stream.avail_in = hdrlen;
+   while (git_deflate(, 0) == Z_OK)
+   ; /* nothing */
+   git_SHA1_Update(, hdr, hdrlen);
+
+   for (;;) {
+   unsigned char buf[4096];
+   int r;
+
+   r = xread(cmd.child.out, buf, sizeof(buf));
+   if (r < 0) {
+   error("unable to read from odb helper '%s': %s",
+ o->name, strerror(errno));
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   git_deflate_end();
+   return -1;
+   }
+   if (r == 0)
+   break;
+
+   total_got += r;
+
+   /* Then the data itself.. */
+   stream.next_in = (void *)buf;
+   stream.avail_in = r;
+   do {
+   unsigned char *in0 = stream.next_in;
+   ret = git_deflate(, Z_FINISH);
+   git_SHA1_Update(, in0, stream.next_in - in0);
+   write_or_die(fd, compressed, stream.next_out - 
compressed);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   } while (ret == Z_OK);
+   }
+
+   close(cmd.child.out);
+   if (ret != Z_STREAM_END) {
+   warning("bad zlib data from odb helper '%s' for %s",
+   o->name, sha1_to_hex(sha1));
+   return -1;
+   }
+   ret = git_deflate_end_gently();
+   if (ret != Z_OK) {
+   warning("deflateEnd on object %s from odb helper '%s' failed 
(%d)",
+   sha1_to_hex(sha1), o->name, ret);
+   return -1;
+   }
+   git_SHA1_Final(real_sha1, );
+   if (hashcmp(sha1, real_sha1)) {
+   warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+   o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+   return -1;
+   }
+   if (odb_helper_finish(o, ))
+   return -1;
+   if (total_got != obj->size) {
+   warning("size mismatch from odb helper '%s' for %s (%lu != 
%lu)",
+   o->name, sha1_to_hex(sha1), total_got, obj->size);
+   return -1;
+   }
+
+   return 0;
+}
+
+static int odb_helper_get_git_object(struct odb_helper *o,
+const unsigned char *sha1,
+int fd)
 {
struct odb_helper_object *obj;
struct odb_helper_cmd cmd;
@@ -340,6 +439,16 @@ int odb_helper_get_direct(struct odb_helper *o,
return res;
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+ const unsigned char *sha1,
+ int fd)
+{
+   if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+   return odb_helper_get_raw_object(o, sha1, fd);
+   else
+   return odb_helper_get_git_object(o, sha1, fd);
+}
+
 int odb_helper_put_object(struct odb_helper *o,
  const void *buf, size_t len,

[PATCH 21/40] t0400: add test for external odb write support

2018-01-03 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 4ccca1e965..f924de870f 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -69,4 +69,12 @@ test_expect_success 'helper can add objects to alt repo' '
test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+   test_config odb.magic.scriptCommand "$HELPER" &&
+   test_commit three &&
+   hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+   content=$(git -C alt-repo show "$hash3") &&
+   test "$content" = "three"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty



Re: [PATCH] http: fix v1 protocol tests with apache httpd < 2.4

2018-01-03 Thread Todd Zullinger
Jeff King wrote:
> On Tue, Jan 02, 2018 at 07:39:46PM -0500, Todd Zullinger wrote:
>> I don't know if there's a clean way to do that
>> automatically, short of parsing the output of 'httpd -v'
>> should we ever need to add such a prereq.
> 
> In the general case, we could probably define an endpoint within an 
> block, and then try to access the endpoint from the test script.
> 
> E.g., something like:
> 
> = 2.4>
> Alias /have-2.4.txt www/yes.txt
> 
> 
> in the apache config, and then:
> 
>   test_lazy_prereq APACHE24 '
> echo yes >"$HTTPD_DOCUMENT_ROOT_PATH/yes.txt" &&
> curl -f "$HTTPD_URL/have-2.4.txt"
>   '
> 
> in the test script (of course we may not want to depend on having
> command-line curl, but we could replace that with "git ls-remote" or
> similar).
> 
> One nice thing about that approach is that it can be extended to other
> "If" blocks, like if we have a particular module available, or if ssl is
> configured.

That's quite elegant.  I even modified an IfVersion block
and didn't think about using it that way to create a prereq.
Neat!

-- 
Todd
~~
You're not drunk if you can lie on the floor without holding on.
-- Dean Martin



Misleading documentation for git-diff-files (diff-filter)

2018-01-03 Thread John Cheng
I originally asked this question on stackoverflow
(https://stackoverflow.com/q/48039277).

I wanted to know if git diff-files shows files that are not in the
index but are in the working tree. The documentation says you can
supply --diff-filter=A, which will select file "that are added".
However, git-diff-files (appears) to never show any files with the
status of "A".

It seems like the cause is that git diff-files includes
diff-options.txt which uses a standard template for --diff-filter
which includes the "A" option. Perhaps a clarification can be added?

Compares the files in the working tree and the index.  When paths
are specified, compares only those named paths.  Otherwise all
entries in the index are compared.  The output format is the
same as for 'git diff-index' and 'git diff-tree'. Files not in the index are
not compared.





-- 
---
John L Cheng


Re: [PATCH v2 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-03 Thread Adam Dinwoodie
On Wednesday 03 January 2018 at 02:31 pm +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Jan 03 2018, Adam Dinwoodie jotted:
> 
> > On Monday 25 December 2017 at 12:28 am +, Ævar Arnfjörð Bjarmason wrote:
> >> There has never been any full roundtrip testing of what git-ls-files
> >> and other functions that use wildmatch() actually do, rather we've
> >> been satisfied with just testing the underlying C function.
> >>
> >> Due to git-ls-files and friends having their own codepaths before they
> >> call wildmatch() there's sometimes differences in the behavior between
> >> the two, and even when we test for those (as with
> >> 9e4e8a64c2 ("pathspec: die on empty strings as pathspec", 2017-06-06))
> >> there was no one place where you can review how these two modes
> >> differ.
> >>
> >> Now there is. We now attempt to create a file called $haystack and
> >> match $needle against it for each pair of $needle and $haystack that
> >> we were passing to test-wildmatch.
> >>
> >> If we can't create the file we skip the test. This ensures that we can
> >> run this on all platforms and not maintain some infinitely growing
> >> whitelist of e.g. platforms that don't support certain characters in
> >> filenames.
> >>
> >> As a result of doing this we can now see the cases where these two
> >> ways of testing wildmatch differ:
> >>
> >>  * Creating a file called 'a[]b' and running ls-files 'a[]b' will show
> >>that file, but wildmatch("a[]b", "a[]b") will not match
> >>
> >>  * wildmatch() won't match a file called \ against \, but ls-files
> >>will.
> >>
> >>  * `git --glob-pathspecs ls-files 'foo**'` will match a file
> >>'foo/bba/arr', but wildmatch won't, however pathmatch will.
> >>
> >>This seems like a bug to me, the two are otherwise equivalent as
> >>these tests show.
> >>
> >> This also reveals the case discussed in 9e4e8a64c2 above, where '' is
> >> now an error as far as ls-files is concerned, but wildmatch() itself
> >> happily accepts it.
> >>
> >> Signed-off-by: Ævar Arnfjörð Bjarmason 
> >
> > I'm seeing this test script failing on the pu branch as a result of this
> > commit when building on Cygwin.  Specifically, the test fails at
> > 9d45e1ca4 ("Merge branch 'bw/oidmap-autoinit' into pu", 2017-12-28), and
> > bisecting points the blame at 2ee0c785a ("wildmatch test: create & test
> > files on disk in addition to in-memory", 2017-12-25).
> >
> > I've copied the verbose error output for the first error below, and
> > uploaded the full output, including verbose and trace output for the
> > unexpectedly failing tests, at [0].  (With 42 failures among 1512 tests,
> > there's a lot of it, so I didn't want to include it in an email.)
> 
> Does the fixup above in <878tdm8k2d@evledraar.gmail.com> work for
> you, i.e. changing $10 in the script to ${10}?

This fixes some but not all of the failures: I'm now down from 42 to 24
failures.

Updated verbose test output is at
https://gist.github.com/me-and/04443bcb00e12436f0eacce079b56d02

Thanks!

Adam


  1   2   >