Re: Fetch-hooks
On 02/20/2018 08:42 AM, Jeff King wrote:>> [...] >> >> Is there a way for “pre-receive” to individually filter [refs]? I was >> under the impression that the only way to do that was to use the >> “update” hook, which was the reason I wanted to model it after “update” >> rather than “pre-receive” (my use case being a check independent for >> each pushed ref) > > No, pre-receive is always atomic. However, that may actually be what you > want, if the point is to keep untrusted data out of the repository. By > rejecting the whole thing, we could in theory keep the objects from > entering the repository at all. This is how the push side works via the > "quarantine" system (which is explained in git-receive-pack(1)). > Fetching doesn't currently quarantine objects, but it probably wouldn't > be very hard to make it do so. Oh, I didn't think about quarantining behavior, indeed! So I guess, following your answer as well as Jake's, I'll try to implement a pre-receive-like hook, and will come back to this list when I'll have a tentative implementation. Thanks for the advice! :)
Re: Fetch-hooks
On 02/19/2018 10:23 PM, Jeff King wrote: > [...] > If you do go this route, please model it after "pre-receive" rather than > "update". We had "update" originally but found it was too limiting for > hooks to see only one ref at a time. So we introduced pre-receive. The > "update" hook remains for historical reasons, but I don't think we'd > want to reproduce the mistake. :) Hmm, what bothered me with “pre-receive” was that it was an all-or-nothing decision, without the ability to allow some references through and not others. Is there a way for “pre-receive” to individually filter hooks? I was under the impression that the only way to do that was to use the “update” hook, which was the reason I wanted to model it after “update” rather than “pre-receive” (my use case being a check independent for each pushed ref)
Re: Fetch-hooks
On 02/14/2018 02:35 AM, Jeff King wrote: > On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote: >> [...] > I think there may have been some more concrete proposals after that, but > that's what I was able to dig up quickly. Thanks! Though it looks way above my knowledge of git internals as well as time available, it looks like a great project, and I hope it someday succeeds! >> That said, actually I just noticed an issue in the “add a >> remote..fetch option to fetch to refs/quarantine then have the >> post-fetch hook do the work”: it means if I run `git pull`, then: >> 1. The remote references will be pulled to refs/quarantine/... >> 2. The post-fetch hook will copy the accepted ones to refs/remotes/... >> 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into >> local branches... and so merge from refs/quarantine. > > Good point. You can't munge FETCH_HEAD by playing with refspecs. > > I am starting to come around to the idea that "pre-fetch" might be the > best way to do what you want. Not to rewrite refs, but perhaps to simply > reject them. In the same way that we allow pre-receive to reject pushed > refs (both are, after all, the final check on admitting new history into > the repository, just in opposite directions). > >> So, when thinking about it, I'm back to thinking the proper hook >> interface should be the one of the tweak-fetch hook, but its >> implementation should make it not go crazy on remote servers. And so >> that the implementation should do all this refs/quarantine wizardry >> inside git itself. > > So does anybody actually want to be able to adjust the refs as they pass > through? It really sounds like you just want to be able to reject or not > reject the fetch. And that rejecting would be the uncommon case, so it's > OK to just abort the whole operation and expect the user to figure it > out. This completely fits my use case (modulo the fact that it's more similar to the `update` hook than to `pre-receive` I think, as verifying the signature requires the objects to already have been downloaded), indeed, though I'm not sure it would have fit Joey's (based on my understanding, adding a merge was what was originally asked for). Actually, I'm wondering if the existing semantics of `update` could not be reused for the `pre-fetch`. Would it make sense to just call `update` during a fetch in the same way as during a receive-pack? That said it likely makes this a breaking change, though it's maybe unlikely that a repository is used both for fetching and for receive-pack'ing, it could happen. So the proposal as I understand it would currently be adding a `fetch-update` hook that does exactly the same thing as `update` but for `fetch`. This solves my use case in a nice way, though it likely doesn't solve Joey's original one (which has been taken care of by a wrapper since then). What do you all think about it?
Re: Fetch-hooks
On 02/12/2018 08:23 PM, Brandon Williams wrote:> Maybe this isn't helpful but you may be able to implement this by using > a remote-helper. The helper could perform any sort of caching it needed > to prevent re-downloading large amounts of data that is potentially > thrown away, while only sending through the relevant commits which > satisfy some criteria (signed, etc.). This looks like a possibility, thanks! I'll likely try to implement it this way if there can be no consensus on the features wanted for a fetch-hook (as the current state of discussion looks like to me).
Re: Fetch-hooks
On 02/10/2018 01:21 PM, Jeff King wrote: > On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote: > >>> Yeah, tag-following may be a little tricky, because it usually wants to >>> write to refs/tags/. One workaround would be to have your config look >>> like this: >>> >>> [remote "origin"] >>> fetch = +refs/heads/*:refs/quarantine/origin/heads/* >>> fetch = +refs/tags/*:refs/quarantine/origin/tags/* >>> tagOpt = --no-tags >>> >>> That's not exactly the same thing, because it would fetch all tags, not >>> just those that point to the history on the branches. But in most >>> repositories and workflows the distinction doesn't matter. >> >> Hmm... apart from the implementation complexity (of which I have no >> idea), is there an argument against the idea of adding a >> remote..fetchTagsTo refmap similar to remote..fetch but used >> every time a tag is fetched? (well, maybe not exactly similar to >> remote..fetch because we know the source is going to be >> refs/tags/*; so just having the part of .fetch past the ':' would be >> more like what's needed I guess) > > I don't think it would be too hard to implement, and is at least > logically consistent with the way we handle tags. > > If we were designing from scratch, I do think this would all be helped > immensely by having more separation of refs fetched from remotes. There > was a proposal in the v1.8 era to fetch everything for a remote, > including tags, into "refs/remotes/origin/heads/", > "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to > look for tags in each of the remote-tag namespaces. > > I still think that would be a good direction to go, but it would be a > big project (which is why the original stalled). Hmm... would this also drown the remote..fetch map? Also, I think this behavior could be emulated with fetch and fetchTagsTo, and it would look like: [remote "my-remote"] fetch = +refs/heads/*:refs/remotes/my-remote/heads/* fetchTagsTo = refs/remotes/my-remote/tags/* The remaining issue being to teach the lookup side to look for tags in all the remote-tag namespaces (and the fact it's a breaking change). That said, actually I just noticed an issue in the “add a remote..fetch option to fetch to refs/quarantine then have the post-fetch hook do the work”: it means if I run `git pull`, then: 1. The remote references will be pulled to refs/quarantine/... 2. The post-fetch hook will copy the accepted ones to refs/remotes/... 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into local branches... and so merge from refs/quarantine. A solution would be to also update FETCH_HEAD in the post-fetch hook, but then we're back to the issue raised by Junio after the “*HOWEVER*” of [1]: the hook writer has to maintain consistency between the “copied” references and FETCH_HEAD. So, when thinking about it, I'm back to thinking the proper hook interface should be the one of the tweak-fetch hook, but its implementation should make it not go crazy on remote servers. And so that the implementation should do all this refs/quarantine wizardry inside git itself. So basically what I'm getting at at the moment is that git fetch should: 1. fetch all the references to refs/real-remotes//{insert here a copy of the refs/ tree of } 2. figure out what the “expected” names for these references will by, by looking at remote..fetch (and at whether --tags was passed) 3. for each “expected” name, 1. if a tweak-fetch hook is present, call it with the refs/real-remotes/... refname and the “expected end-name” from remote..fetch 2. based on the hook's result, potentially to move the “expected end-name” to another commit than the one referenced by refs/real-remotes/... 3. move the “expected” name to the commit referenced in refs/real-remotes Which would make the tweak-fetch hook interface simpler (though more restrictive, but I don't have any real use case for the other change possibilities) than it is now: 1. feed the hook with lines of “refs/real-remotes/my-remote/heads/my-branch refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is “normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag” (if my-tag is being fetched from my-remote) 2. read lines of “ refs/remotes/my-remote/my-branch”, that will re-point my-branch to instead of refs/real-remotes/my-remote/heads/my-branch. In order to avoid any issue, the hook is not allowed to pass as second output a reference that was not passed as second input. This way, the behavior of the tweak-fetch hook is reasonably preserved (at least for my use case), and there is no additional load on the servers thanks to the up-to-date references being stored in refs/real-remotes// Does this reasoning make any sense? [1] https://marc.info/?l=git=132478296309094=2
Re: Fetch-hooks
On 02/10/2018 02:33 AM, Leo Gaspard wrote:> I guess the very early part of the discussion you're speaking of is what > I was assuming after reading > https://marc.info/?l=git=132478296309094=2 > > [...] > > So the reason for a post-fetch in my opinion is the same as for a > pre-push / pre-commit: not changing the user's workflow, while providing > additional features. Well, re-reading my email, it looks aggressive to me now... sorry about that! What I meant was basically, that in my mind pre-push or pre-commit are also local-only things, and they are really useful in that they allow to block the push or the commit if some conditions are not met (eg. block commit with trailing whitespace, or block push of unsigned commits). In pretty much the same way, what I'm really looking for is a way to “block the fetch” (from a user-visible standpoint) -- like pre-push but in the opposite direction. I hope such a goal is not an anti-pattern for hook design? In order to do this, I first tried updating this tweak-fetch hook patch from late 2011, and then learned that it would cause too high a load on servers. Then, while brainstorming another solution, this idea of a notification-only post-fetch hook arose. But, as I noticed while writing my previous answer, this suffers the same the-hook-writer-must-correctly-update-FETCH_HEAD-concurrently-with-remote-branch issue that you pointed out just after the “*HOWEVER*” in [1]. So that solution is likely a bad solution too. I guess we'll continue the search for a good solution in the side-thread with Peff, hoping to figure one out. That being said about what I meant in my last email, obviously you're the one who has the say on what goes in or not! And if it's an anti-feature I'd much rather know it now than after spending a few nights coding :) So, what do you think about this use case I'm thinking of? (ie. “blocking the fetch” like pre-push “blocks the push” and pre-commit “blocks the commit”?) [1] https://marc.info/?l=git=132478296309094=2
Re: Fetch-hooks
On 02/10/2018 02:08 AM, Junio C Hamano wrote: > Leo Gaspard <l...@gaspard.io> writes: > >> On 02/10/2018 01:13 AM, Jeff King wrote: >>> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: >>>> So the changes that are required are: >>>> * Adding a notification-only post-fetch hook > > Maybe I missed a very early part of the discussion, but why does > this even need a hook? There are some numbers [*1*] of classes of > valid reasons we may want to have hooks triggered by operations, but > "always do something locally after doing something else locally, > regardless of the outcome of that something else" feels like the > most typical anti-pattern that we do not want a hook for. If you > are doing "git fetch" (or "git pull"), you already know you are > doing that and you donot need a notification. You just create a > workflow specific script that calls fetch or pull, followed by > whatever you want to do and use that, instead of doing "git pull", > and that is not any extra work than writing a hook and installing > it. > > Unlike something like post-receive, which happens on the remote side > where you may not even have an interactive access to, in response to > the operation you locally do (i.e. "git push"), fetching and then > doing something else in a repository you fetch into has no reason to > be done as a hook. I guess the very early part of the discussion you're speaking of is what I was assuming after reading https://marc.info/?l=git=132478296309094=2 Then, it's always possible to just write workflow-specific scripts that manually run git fetch then copy the refs (resp. run git fetch then copy the refs then run git merge) to get git myfetch (resp. git mypull) [1]. But then, the question is, why is there a pre-push hook? it's already possible to have a custom script that first runs the hook then runs git push, for most if not all of the use cases of the pre-push hook. Yet the possibility to not change the end-user's workflow is what makes pre-push (or pre-commit, with which the similarity is perhaps even more obvious) so useful. So the reason for a post-fetch in my opinion is the same as for a pre-push / pre-commit: not changing the user's workflow, while providing additional features. [1] Which makes me notice that actually the post-fetch hook technique we were discussing with Peff suffers the same not-updating-FETCH_HEAD issue that was discussed in this early thread: a `git pull` would try to merge from refs/quarantine, I guess. So things are a bit harder than we thought. I guess the tweak-fetch hook could be left “as-is”, but with git automatically doing the “quarantine-ing” thing transparently so that the references that the end-user (or the hook for that matter) see are the “curated” ones? Then it's too late for me to think efficiently right now, so that idea may be stupid or over-complex.
Re: Fetch-hooks
On 02/10/2018 01:13 AM, Jeff King wrote: > On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: >> So the changes that are required are: >> * Adding a notification-only post-fetch hook >> * For handling tags, there is a need to have a refmap for tags. Maybe >> adding a remote.my-remote.fetchTags refmap, that would be used when >> running with --tags, and having it default to “refs/tags/*:refs/tags/*” >> to keep the current behavior by default? > > Yeah, tag-following may be a little tricky, because it usually wants to > write to refs/tags/. One workaround would be to have your config look > like this: > > [remote "origin"] > fetch = +refs/heads/*:refs/quarantine/origin/heads/* > fetch = +refs/tags/*:refs/quarantine/origin/tags/* > tagOpt = --no-tags > > That's not exactly the same thing, because it would fetch all tags, not > just those that point to the history on the branches. But in most > repositories and workflows the distinction doesn't matter. Hmm... apart from the implementation complexity (of which I have no idea), is there an argument against the idea of adding a remote..fetchTagsTo refmap similar to remote..fetch but used every time a tag is fetched? (well, maybe not exactly similar to remote..fetch because we know the source is going to be refs/tags/*; so just having the part of .fetch past the ':' would be more like what's needed I guess) The issue with your solution is that if the user runs 'git fetch --tags', he will get the (potentially compromised) tags directly in his refs/tags/. > (By the way, the I specifically chose the name "refs/quarantine" instead > of anything in "refs/remotes" because we'd want to make sure that the > "git checkout" DWIM behavior cannot accidentally pull from quarantine). (Indeed, I understood after reading it, and would likely not have thought of it otherwise, thanks!) >> The only remaining issue I can think of is: How do we avoid the issue >> of the >> trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification >> raised in the side-thread that Junio wrote in [1]? Maybe just writing >> in the documentation that the hook should use a quarantine-like >> approach if it wants modification would be enough to not have hook >> authors try to modify the ref in the post-fetch hook? > > I don't have a silver bullet there. Documenting the "right" way at least > seems like a good first step. So long as it's not a merge-blocker it's good with me! (but then I'm likely not the one who's going to be pointed at when things go wrong in a hook, so I'm clearly biased on this matter)
Re: Fetch-hooks
On 02/09/2018 11:30 PM, Jeff King wrote: > On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote: >> One thing that's not discussed yet, and I know just enough about for it >> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & >> Brandon who know more) is that this feature once shipped might cause >> higher load on git hosting providers. >> >> This is because people will inevitably use it in popular projects for >> some custom filtering, and because you're continually re-fetching and >> inspecting stuff what used to be a really cheap no-op "pull" most of the >> time is a more expensive negotiation every time before the client >> rejects the refs again, and worse for hosting providers because you have >> bespoke ref fetching strategies you have less odds of being able to >> cache both the negotiation and the pack you serve. > > Most of the discussion so far seems to be about "accept this ref or > don't accept this ref", which seems OK. But if you are going to do > custom tweaking like rewriting objects, or making it common to refuse > some refs, then I think things get pretty inefficient for _everybody_. > > The negotiation for future fetches uses the existing refs as the > starting point. And if we don't know that we have the objects because > there are no refs pointing at them, they're going to get transferred > again. That's extra load no the server, and extra time for the user > waiting on the network. Oh. I thought the protocol git used was something like client: I want to fetch refs A and B server: so you'll need objects 12345678 and 90ABCDEF, A and B both point to 12345678 client: please give me object 12345678 server: here it is [...] I was clearly wrong, thanks! (and thanks Ævar for your explanation in the side-thread, too!) > I tend to agree with the direction of thinking you outlined: you're > generally better off completing the fetch to a local namespace that > tracks the other side completely, and then manipulating the local refs > as you see fit (e.g., fetching into refs/quarantine, and then migrating > "good" refs over to refs/remotes/origin). Hmm... so do I understand it correctly when I say the process you're thinking about works like this? * User installs hook for my-remote by running [something] * User runs git fetch * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so I guess [something] changes the remote.my-remote.fetch refmap) * When this is done `git fetch` runs a notification-only post-fetch hook (that would need to be added) * The post-fetch hook then performs whatever it wants and updates the references in refs/remotes/my-remote/* So the changes that are required are: * Adding a notification-only post-fetch hook * For handling tags, there is a need to have a refmap for tags. Maybe adding a remote.my-remote.fetchTags refmap, that would be used when running with --tags, and having it default to “refs/tags/*:refs/tags/*” to keep the current behavior by default? The only remaining issue I can think of is: How do we avoid the issue of the trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification raised in the side-thread that Junio wrote in [1]? Maybe just writing in the documentation that the hook should use a quarantine-like approach if it wants modification would be enough to not have hook authors try to modify the ref in the post-fetch hook? Thanks for all your thoughts, and hope we're getting somewhere! Leo PS: I'll read over the reviews once I'm all clear as to what exactly is wanted for this patch, as most likely they'll just be dumped, given the current state of affairs. [1] https://marc.info/?l=git=132480559712592=2
Re: Fetch-hooks
On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also have some intermediate step between these two, where >>> e.g. your refspec for "origin" is >>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default >>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that >>> location, then you move them (with some alias/hook) to >>> "refs/remotes/origin/*" once they're seen to be "OK". >> >> That is indeed another possibility, but then the idea is to make things >> as transparent as possible for the end-user, not to completely change >> their git workflow. As such, considering only signed commits to be part >> of the upstream seems to make sense to me? > > I mean this would be something that would be part of a post-fetch hook, > so it would be as transparent as what you're doing to the user, with the > difference that it doesn't need to involve changes to what you slurp > down from the server. > > I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run > your post-fetch hook and you go over the new refs, and copy what you > like (usually everything) to refs/remotes/origin/*. Hmm... but that would then require a post-fetch hook, wouldn't it? And about a post-fetch hook, if I understood correctly, Junio in [1] had a quite nice argument against it: Although I do not deeply care between such a "trigger to only notify, no touching" hook and a full-blown "allow hook writers to easily lie about what happened in the fetch" hook, I was hoping that we would get this right and useful if we were spending our brain bandwidth on it. I am not very fond of an easier "trigger to only notify" hook because people are bound to misuse the interface and try updating the refs anyway, making it easy to introduce inconsistencies between refs and FETCH_HEAD that will confuse the later "merge" step. Otherwise, if it doesn't require a post-fetch hook, then it would require the end-user to first fetch, then run the `copy-trusted-refs-over` script, which would add stuff to the user's workflow. Did I miss another possibility? > [...] > > One thing that's not discussed yet, and I know just enough about for it > to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & > Brandon who know more) is that this feature once shipped might cause > higher load on git hosting providers. > > This is because people will inevitably use it in popular projects for > some custom filtering, and because you're continually re-fetching and > inspecting stuff what used to be a really cheap no-op "pull" most of the > time is a more expensive negotiation every time before the client > rejects the refs again, and worse for hosting providers because you have > bespoke ref fetching strategies you have less odds of being able to > cache both the negotiation and the pack you serve. > > I.e. you want this for some security feature where 99.99% of the time > you accept all refs, but most people will probably use this to implement > dynamic Turing-complete refspecs. > > Maybe that's worrying about nothing, but worth thinking about. Well... First, I must say I didn't really understand your last paragraph about Turing-complete refspecs. But my understanding of how the fetch-hook patchset I sent this evening works is that it first receives all the objects from the hosting provider, then locally moves the refs, but never actually discards the downloaded objects (well, until a `git gc` I guess). So I don't think the network traffic with the provider would be any different wrt. what it is now, even if a tweak-fetch hook rejects some commits? Then again I don't know git's internals enough to be have even a bit of certainty about what I'm saying right now, so... [1] https://marc.info/?l=git=132480559712592=2
[PATCH 1/2] fetch: preparations for tweak-fetch hook
From: Léo Gaspard <l...@gaspard.io> No behavior changes yet, only some groundwork for the next change. The refs_result structure combines a status code with a ref map, which can be NULL even on success. This will be needed when there's a tweak-fetch hook, because it can filter out all refs, while still succeeding. fetch_refs returns a refs_result, so that it can modify the ref_map. Based-on-patch-by: Joey Hess <j...@kitenet.net> Signed-off-by: Leo Gaspard <l...@gaspard.io> --- builtin/fetch.c | 68 + 1 file changed, 44 insertions(+), 24 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 7bbcd26fa..76dc05f61 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -34,6 +34,11 @@ enum { TAGS_SET = 2 }; +struct refs_result { + struct ref *new_refs; + int status; +}; + static int fetch_prune_config = -1; /* unspecified */ static int prune = -1; /* unspecified */ #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */ @@ -57,6 +62,18 @@ static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static int add_existing(const char *refname, const struct object_id *oid, + int flag, void *cbdata) +{ + struct string_list *list = (struct string_list *)cbdata; + struct string_list_item *item = string_list_insert(list, refname); + struct object_id *old_oid = xmalloc(sizeof(*old_oid)); + + oidcpy(old_oid, oid); + item->util = old_oid; + return 0; +} + static int git_fetch_config(const char *k, const char *v, void *cb) { if (!strcmp(k, "fetch.prune")) { @@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head, } } -static int add_existing(const char *refname, const struct object_id *oid, - int flag, void *cbdata) -{ - struct string_list *list = (struct string_list *)cbdata; - struct string_list_item *item = string_list_insert(list, refname); - struct object_id *old_oid = xmalloc(sizeof(*old_oid)); - - oidcpy(old_oid, oid); - item->util = old_oid; - return 0; -} - static int will_fetch(struct ref **head, const unsigned char *sha1) { struct ref *rm = *head; @@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map) return check_connected(iterate_ref_map, , ); } -static int fetch_refs(struct transport *transport, struct ref *ref_map) +static struct refs_result fetch_refs(struct transport *transport, + struct ref *ref_map) { - int ret = quickfetch(ref_map); - if (ret) - ret = transport_fetch_refs(transport, ref_map); - if (!ret) - ret |= store_updated_refs(transport->url, + struct refs_result ret; + ret.status = quickfetch(ref_map); + if (ret.status) { + ret.status = transport_fetch_refs(transport, ref_map); + } + if (!ret.status) { + ret.new_refs = ref_map; + ret.status |= store_updated_refs(transport->url, transport->remote->name, - ref_map); + ret.new_refs); + } transport_unlock_pack(transport); return ret; } @@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen) return transport; } -static void backfill_tags(struct transport *transport, struct ref *ref_map) +static struct refs_result backfill_tags(struct transport *transport, + struct ref *ref_map) { int cannot_reuse; + struct refs_result res; /* * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it @@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL); transport_set_option(transport, TRANS_OPT_DEPTH, "0"); transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL); - fetch_refs(transport, ref_map); + res = fetch_refs(transport, ref_map); if (gsecondary) { transport_disconnect(gsecondary); gsecondary = NULL; } + + return res; } static int do_fetch(struct transport *transport, @@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport, struct string_list existing_refs = STRING_LIST_INIT_DUP; struct ref *ref_map; struct ref *rm; + struct refs_result res; int autotags = (transport->remote->fetch_tags == 1); int retcode = 0; @@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport, transport->url); } } - if (fetch_refs(transport, ref_map)) { + + res = fetch_refs(transport,
[PATCH 2/2] fetch: add tweak-fetch hook
From: Léo Gaspard <l...@gaspard.io> The tweak-fetch hook is fed lines on stdin for all refs that were fetched, and outputs on stdout possibly modified lines. Its output is then parsed and used when `git fetch` updates the remote tracking refs, records the entries in FETCH_HEAD, and produces its report. The modifications here are heavily based on prior work by Joey Hess. Based-on-patch-by: Joey Hess <j...@kitenet.net> Signed-off-by: Leo Gaspard <l...@gaspard.io> --- Documentation/githooks.txt | 37 +++ builtin/fetch.c | 210 +++- t/t5574-fetch-tweak-fetch-hook.sh | 90 templates/hooks--tweak-fetch.sample | 24 + 4 files changed, 359 insertions(+), 2 deletions(-) create mode 100755 t/t5574-fetch-tweak-fetch-hook.sh create mode 100755 templates/hooks--tweak-fetch.sample diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index f877f7b7c..1b4a18bf0 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -177,6 +177,43 @@ This hook can be used to perform repository validity checks, auto-display differences from the previous HEAD if different, or set working dir metadata properties. +tweak-fetch +~~~ + +This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs +have been fetched from the remote repository. It is not executed, if nothing was +fetched. + +The output of the hook is used to update the remote-tracking branches, and +`.git/FETCH_HEAD`, in preparation for a later merge operation done by 'git +merge'. + +It takes no arguments, but is fed a line of the following format on its standard +input for each ref that was fetched. + + SP not-for-merge|merge|ignore SP SP LF + +Where the "not-for-merge" flag indicates the ref is not to be merged into the +current branch, and the "merge" flag indicates that 'git merge' should later +merge it. + +The `` is the remote's name for the ref that was fetched, and +`` is a name of a remote-tracking branch, like +"refs/remotes/origin/master". `` can be undefined if the fetched +ref is not being stored in a local refname. In this case, it will be set to `@`, +an invalide refspec, so that scripts can be written more easily. + +TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not +really sure I get what this does or what invariants it is supposed to maintain +(eg. all “ignore” updates at the end of the refs list?), so this may also +require code changes. + +The hook must consume all of its standard input, and output back lines of the +same format. It can modify its input as desired, including adding or removing +lines, updating the sha1 (i.e. re-point the remote-tracking branch), changing +the merge flag, and changing the `` (i.e. use different +remote-tracking branch). + post-merge ~~ diff --git a/builtin/fetch.c b/builtin/fetch.c index 76dc05f61..1bb394530 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -28,6 +28,8 @@ static const char * const builtin_fetch_usage[] = { NULL }; +static const char tweak_fetch_hook[] = "tweak-fetch"; + enum { TAGS_UNSET = 0, TAGS_DEFAULT = 1, @@ -181,6 +183,206 @@ static struct option builtin_fetch_options[] = { OPT_END() }; +static int feed_tweak_fetch_hook(int in, int out, void *data) +{ + struct ref *ref; + struct strbuf buf = STRBUF_INIT; + const char *kw, *peer_ref; + char oid_buf[GIT_SHA1_HEXSZ + 1]; + int ret; + + for (ref = data; ref; ref = ref->next) { + if (ref->fetch_head_status == FETCH_HEAD_MERGE) + kw = "merge"; + else if (ref->fetch_head_status == FETCH_HEAD_IGNORE) + kw = "ignore"; + else + kw = "not-for-merge"; + if (!ref->name) + die("trying to fetch an inexistant ref"); + if (ref->peer_ref && ref->peer_ref->name) + peer_ref = ref->peer_ref->name; + else + peer_ref = "@"; + strbuf_addf(, "%s %s %s %s\n", + oid_to_hex_r(oid_buf, >old_oid), kw, + ref->name, peer_ref); + } + + ret = write_in_full(out, buf.buf, buf.len) != buf.len; + if (ret) + warning("%s hook failed to consume all its input", + tweak_fetch_hook); + close(out); + strbuf_release(); + return ret; +} + +static struct ref *parse_tweak_fetch_hook_line(char *l, + struct string_list *existing_refs) +{ + struct ref *ref = NULL, *peer_ref = NULL; + struct string_list_item *peer_item = NULL; + c
[PATCH 0/2] fetch: add tweak-fetch hook
On 02/09/2018 09:20 PM, Joey Hess wrote:> Yes; my patches are under the same GPL-2 as the rest of git. Thanks! So here comes my patch series, heavily based on yours. There are some things to bear in mind while reviewing it: * This is my first real attempt at contributing to git, which means I could be very very far off-track * Most of it is based on trying to make the 6-year-old patch series work and pass all the tests, so if a new feature has been added since then I likely didn't notice it or don't know how to handle it correctly There are still three TODO's in the code: * In the documentation, one stating that I don't really get what this “ignore” parameter exactly does, and whether it should be handled specially (a prime example of a new feature I'm not really sure how to handle, somewhere in the code it's written all the “ignore” references are at the end of the list, but I'm already not self-confident enough about the difference between “merge” and “not-for-merge” to even consider making a good choice about how to handle “ignore”) * In `builtins/fetch.c`, function `do_fetch`, there is a conflict of interest between placing the `prune` before the `fetch` (as done by commit 10a6cc889 ("fetch --prune: Run prune before fetching", 2014-01-02)), and placing the `fetch` before the `prune` (which would allow hooks that rename the local-ref to not be prune'd and then re-fetched when doing a `git fetch --prune` -- without that a hook that would want to both read the old commit information and rename the local-ref would not be able to). Or maybe this question means actually there should be a third solution? but I don't really know what. Maybe also hooking into the prune operation? * In `templates/hooks--tweak-fetch.sample`, the “check this actually works” todo, as I'd rather first check this patch series is not too far off-topic before doing non-essential work -- anyway another version of the patch series will be required for the other two TODO's, so I can fix it at this point. That being said, what do you think about these patches? Thanks for your time! Leo Gaspard
Re: Fetch-hooks
On 02/08/2018 06:02 PM, Leo Gaspard wrote: > On 02/08/2018 04:30 PM, Joey Hess wrote: >> [...] > > Hmm, OK, so I guess I'll try to update the patch when I get some time to > delve into git's internals, as my use case (forbidding some fetches) > couldn't afaik be covered by a wrapper hook. Joey, I just wanted to check, you did not put the Signed-off-by line in patches in https://marc.info/?l=git=132491485901482=2 Could you confirm that the patches you sent are “covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file”, so that I could send the patch I wrote based on yours with a Signed-off-by line of my own without breaking the DCO? Thanks! Leo
Re: Fetch-hooks
On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I guess I'll try to update the patch when I get some time to >> delve into git's internals, as my use case (forbidding some fetches) >> couldn't afaik be covered by a wrapper hook. > > Per my reading of > https://public-inbox.org/git/20111224234212.ga21...@gnu.kitenet.net/ > what Joey implemented is not what you described in your initial mail. > > His is a *post*-fetch hook, we've already done the fetch and are just > telling you as a courtesy what refs changed. You could also implement > this as some cronjob that polls git for-each-ref but it's easier as a > hook, fine. I was thinking along the lines of https://marc.info/?l=git=132486687023893=2 with high-level description at https://marc.info/?l=git=132480559712592=2 With the high-level description given here, I'm pretty sure I can hack a hook together to make things work as I want them to. > What you're describing is something like a pre-fetch hook analogous to > the pre-receive hooks, where you're fed refs updated on the remote on > stdin, and can say you don't want some of those to be updated. > > This may just be a lack of imagination on my part, but I don't see how > that's sensible at all. > > The refs we fetch are our *copy* of the remote refs, why would you not > want to track the upstream remote. You're going to refuse some branches > and what? Be further behind until some point in the future where the tip > is GPG-signed and you accept it, at which poich you'll need to do more > work than if you were up-to-date with the almost-GPG-signed version? That's about it. I want all fetching to be blocked in case of the tip not being signed. As there is a pre-push hook ensuring committers don't forget to sign before pushing, the only case the tip could not be signed is in case of an attack, which means it's better to just force-push master because any git repo that fetched it is doomed anyway. Definitely would not want to allow an untrusted revision get into anything that could even remotely be taken as “endorsed” by the user. (BTW, in order to avoid the case of someone forgetting to sign the commit and not having installed the pre-push hook, there can be holes in the commit-signing chain, the drawback being that the committer pushing a signed commit takes responsibility for all unsigned commits directly preceding his -- allowing them to recover in case of a mistaken push) > I think you're confusing two things here. One is the reasonable concern > of wanting to not have your local copy of remote refs have undesirable > content, but a pre-fetch hook is not the way to accomplish that. Well, a pre-fetch hook is a possible way of accomplishing that, and I don't know of any better one? > The other is e.g. to ensure that you never locally check out some "bad" > ref, we don't have hook support for that, but could add it, > e.g. git-checkout and git reset --hard could be taught about some > pre-checkout hook. Issue is, once we have to fix checkout and reset, all other commands that potentially touch the worktree also have to be fixed (eg. I don't know whether worktree add triggers pre-checkout?) Also, this requires the hook to store a database of all the paths that have been checked, because there is no logic in how one may choose to checkout the repo. While having a tweak-fetch hook would make the implementation straightforward, because at the time of invoking the hook the “refname at remote” commit is already trusted, and the “object name” is the commit whose validity we want to check, so we just have to check the path between those two. (I don't know if you checked my current scripts, but basically as the set of allowed PGP keys can change at any commit, it's only possible to check a commit path, not a single commit out-of-nowhere) The only issue that could arise with a tweak-fetch hook is in case of a force-fetch (and even maybe it's not even an actual issue, I haven't given it real thought yet), but this can reasonably be banned, as once a commit is signed it enters the “real” master branch, that should never be moved backward, as it can't be the sign of an attack. > You could also have some intermediate step between these two, where > e.g. your refspec for "origin" is > "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default > "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that > location, then you move them (with some alias/hook) to > "refs/remotes/origin/*" once they're seen to be "OK". That is indeed another possibility, but then the idea is to make things as transparent as possible for the end-user, not to completely change their git workflow. As such, considering only signed commits to be part of the upstream seems to make sense to me? Cheers, Leo
Re: Fetch-hooks
On 02/08/2018 04:30 PM, Joey Hess wrote: > Leo Gaspard wrote: >> That said, I just came upon [1] (esp. the description [2] and the patch >> [3]), and wondered: it looks like the patch was abandoned midway in >> favor of a hook refactoring. Would you happen to know whether the hook >> refactoring eventually took place, and/or whether this patch was >> resubmitted later, and/or whether it would still be possible to merge >> this now? (not having any experience with git's internals yet, I don't >> really know whether these are stupid questions or not) >> >> PS: Cc'ing Joey, as you most likely know best what eventually happened, >> if you can remember it? > > I don't remember it well, but reviewing the thread, I think it foundered > on this comment by Junio: > >> That use case sounds like that "git fetch" is called as a first class UI, >> which is covered by "git myfetch" (you can call it "git annex fetch") >> wrapper approach, the canonical example of a hook that we explicitly do > ^ >> not want to add. > ^^^ > > While I still think a fetch hook would be a good idea for reasons of > composability, I then just went off and implemented such a wrapper for > my own particular use case, and the wrapper program then grew to cover > use cases that a hook would not have been able to cover, so ... Hmm, OK, so I guess I'll try to update the patch when I get some time to delve into git's internals, as my use case (forbidding some fetches) couldn't afaik be covered by a wrapper hook. Thanks for the feedback! Leo
Re: Fetch-hooks
On 02/07/2018 11:51 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 07 2018, Leo Gaspard jotted: > >> Hello, >> >> tl;dr: Is there currently a way to have fetch hooks, and if not do you >> think it could be a nice feature? >> >> I was in the process of implementing hooks for git that ensure the >> repository is always cleanly signed by someone allowed to by the >> repository itself. I think I've completed the signature-checking part >> [1] and the push hook [2] (even though it isn't really configurable at >> the moment). >> >> However, I was starting to think about handling the fetch step, and >> couldn't find any fetch hook. Is there one? >> >> If not, would you think it is would be a good idea to add one, that >> would eg. be passed the commit-before, commit-after and could block the >> changing of the reference if it failed? >> >> The only other solution I could think of is using a separate script for >> fetching, but that would be fragile, as the user could always not think >> about it well and run a git fetch, breaking the objective that after the >> first clone all commits were correctly signature-checked. >> >> Thanks for reading me! >> Leo >> >> PS1: I am not subscribed to the ML. >> >> PS2: I've tried asking freenode#git, without success so far. >> >> >> [1] >> https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh >> >> [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push > > There is no fetch hook, however you may find that the > post-{checkout,merge} hooks are suitable for what you want to do. > > Setting those to some custom comand is a common pattern for > e.g. compiling some assets on "git pull", so you could similarly check > the commits from HEAD, of course those are post-* hooks, so they won't > stop the checkout. Hmm, I don't think these would fit the bill. For post-merge, simply because I spend my life rebasing stuff around, and very rarely merge. For post-checkout, it could work, but then I'd need to keep track manually of up to where the commits have been checked and to search the git graph for the latest checked ancestor (as otherwise checking-out another branch then checking-out the first branch again would likely trigger a failure, due to the keyring being dynamic), so it would likely be a dealbreaker, due to the hook becoming too complex to be trusted. (Just in case you wonder, by “the keyring being dynamic” I mean the PGP keys allowed to sign commits are stored directly inside the git repository) That said, I just came upon [1] (esp. the description [2] and the patch [3]), and wondered: it looks like the patch was abandoned midway in favor of a hook refactoring. Would you happen to know whether the hook refactoring eventually took place, and/or whether this patch was resubmitted later, and/or whether it would still be possible to merge this now? (not having any experience with git's internals yet, I don't really know whether these are stupid questions or not) Thanks! Leo PS: Cc'ing Joey, as you most likely know best what eventually happened, if you can remember it? [1] https://marc.info/?t=13247704151=1=2 [2] https://marc.info/?l=git=132483581218382=2 [3] https://marc.info/?l=git=132486687023893=2
Fetch-hooks
Hello, tl;dr: Is there currently a way to have fetch hooks, and if not do you think it could be a nice feature? I was in the process of implementing hooks for git that ensure the repository is always cleanly signed by someone allowed to by the repository itself. I think I've completed the signature-checking part [1] and the push hook [2] (even though it isn't really configurable at the moment). However, I was starting to think about handling the fetch step, and couldn't find any fetch hook. Is there one? If not, would you think it is would be a good idea to add one, that would eg. be passed the commit-before, commit-after and could block the changing of the reference if it failed? The only other solution I could think of is using a separate script for fetching, but that would be fragile, as the user could always not think about it well and run a git fetch, breaking the objective that after the first clone all commits were correctly signature-checked. Thanks for reading me! Leo PS1: I am not subscribed to the ML. PS2: I've tried asking freenode#git, without success so far. [1] https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push
Re: Migrating away from SHA-1?
First, sorry for not having this message threaded: I'm not subscribed to the list and haven't found a way to get a Message-Id from gmane. I just wanted to ask, as an end-user highly relying on commit signatures, a few questions as to the migration away from SHA-1. SHA-1 already suffers from a freestart collision attack. Based on what I understand of the object model of git, a chosen-prefix collision attack (perhaps somewhat improved) is enough to make reviewers accept a patch, sign it, and then swap the innocuous-looking patch for an evil-doing one -- which *will be signed*. As for the issue about code checking being an easier entrypoint (Theodore Ts'o, 2016-04-14 22:40:51 GMT), in a use case of mine there is a repo with my dotfiles on an untrusted server. Yet I download them and am able to execute them without fear because each commit is PGP-signed with my key. The point being that code checking is not even a possible entrypoint in some cases, so SHA-1 seems to be(come) the weakest link. So, I don't think it is possible to disagree with Jeff King when he wrote his 2016-04-12 23:15:19 GMT email. Peter Anvin (2016-04-14 17:28:50 GMT) gets a point in that there is no need to hurry (chosen-prefix collisions may be still quite a long way, even though there is no guesswork in these matters), and quality is important. Yet Jeff King's proposal (2016-04-12 23:42:52 GMT), amended by Junio Hamano (2016-04-13 01:03:02 GMT) and himself (2016-04-13 01:36:32 GMT) seem to have met no opposition. So, my questions to the git team: * Is there a consensus, that git should migrate away from SHA-1 before it gets a collision attack, because it would mean chosen-prefix collision isn't far away and people wouldn't have the time to upgrade? * Is there a consensus, that Peter Anvin's amended transition plan is the way to go? * If the two conditions above are fulfilled, has work started on it yet? (I guess as Brian Carlson had started his work 9 weeks ago and he was speaking about working on it on the week-end he should have finished it now, so excluding this) * If the two first conditions are fulfilled, is there anything I could do to help this transition? (including helping Brian if his work hasn't actually ended yet) Sorry for bringing up again a subject that seems to be quite recurrent, and for this long block of text, Leo Gaspard signature.asc Description: OpenPGP digital signature