from:"Leo Gaspard"

Re: Fetch-hooks

2018-02-20 Thread Leo Gaspard

On 02/20/2018 08:42 AM, Jeff King wrote:>> [...]
>>
>> Is there a way for “pre-receive” to individually filter [refs]? I was
>> under the impression that the only way to do that was to use the
>> “update” hook, which was the reason I wanted to model it after “update”
>> rather than “pre-receive” (my use case being a check independent for
>> each pushed ref)
> 
> No, pre-receive is always atomic. However, that may actually be what you
> want, if the point is to keep untrusted data out of the repository. By
> rejecting the whole thing, we could in theory keep the objects from
> entering the repository at all. This is how the push side works via the
> "quarantine" system (which is explained in git-receive-pack(1)).
> Fetching doesn't currently quarantine objects, but it probably wouldn't
> be very hard to make it do so.

Oh, I didn't think about quarantining behavior, indeed!

So I guess, following your answer as well as Jake's, I'll try to
implement a pre-receive-like hook, and will come back to this list when
I'll have a tentative implementation. Thanks for the advice! :)

Re: Fetch-hooks

2018-02-19 Thread Leo Gaspard

On 02/19/2018 10:23 PM, Jeff King wrote:
> [...]
> If you do go this route, please model it after "pre-receive" rather than
> "update". We had "update" originally but found it was too limiting for
> hooks to see only one ref at a time. So we introduced pre-receive. The
> "update" hook remains for historical reasons, but I don't think we'd
> want to reproduce the mistake. :)

Hmm, what bothered me with “pre-receive” was that it was an
all-or-nothing decision, without the ability to allow some references
through and not others.

Is there a way for “pre-receive” to individually filter hooks? I was
under the impression that the only way to do that was to use the
“update” hook, which was the reason I wanted to model it after “update”
rather than “pre-receive” (my use case being a check independent for
each pushed ref)

Re: Fetch-hooks

2018-02-13 Thread Leo Gaspard

On 02/14/2018 02:35 AM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote:
>> [...]
> I think there may have been some more concrete proposals after that, but
> that's what I was able to dig up quickly.

Thanks! Though it looks way above my knowledge of git internals as well
as time available, it looks like a great project, and I hope it someday
succeeds!

>> That said, actually I just noticed an issue in the “add a
>> remote..fetch option to fetch to refs/quarantine then have the
>> post-fetch hook do the work”: it means if I run `git pull`, then:
>>  1. The remote references will be pulled to refs/quarantine/...
>>  2. The post-fetch hook will copy the accepted ones to refs/remotes/...
>>  3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
>> local branches... and so merge from refs/quarantine.
> 
> Good point. You can't munge FETCH_HEAD by playing with refspecs.
> 
> I am starting to come around to the idea that "pre-fetch" might be the
> best way to do what you want. Not to rewrite refs, but perhaps to simply
> reject them. In the same way that we allow pre-receive to reject pushed
> refs (both are, after all, the final check on admitting new history into
> the repository, just in opposite directions).
> 
>> So, when thinking about it, I'm back to thinking the proper hook
>> interface should be the one of the tweak-fetch hook, but its
>> implementation should make it not go crazy on remote servers. And so
>> that the implementation should do all this refs/quarantine wizardry
>> inside git itself.
> 
> So does anybody actually want to be able to adjust the refs as they pass
> through? It really sounds like you just want to be able to reject or not
> reject the fetch. And that rejecting would be the uncommon case, so it's
> OK to just abort the whole operation and expect the user to figure it
> out.

This completely fits my use case (modulo the fact that it's more similar
to the `update` hook than to `pre-receive` I think, as verifying the
signature requires the objects to already have been downloaded), indeed,
though I'm not sure it would have fit Joey's (based on my understanding,
adding a merge was what was originally asked for).

Actually, I'm wondering if the existing semantics of `update` could not
be reused for the `pre-fetch`. Would it make sense to just call `update`
during a fetch in the same way as during a receive-pack? That said it
likely makes this a breaking change, though it's maybe unlikely that a
repository is used both for fetching and for receive-pack'ing, it could
happen.

So the proposal as I understand it would currently be adding a
`fetch-update` hook that does exactly the same thing as `update` but for
`fetch`. This solves my use case in a nice way, though it likely doesn't
solve Joey's original one (which has been taken care of by a wrapper
since then).

What do you all think about it?

Re: Fetch-hooks

2018-02-13 Thread Leo Gaspard

On 02/12/2018 08:23 PM, Brandon Williams wrote:> Maybe this isn't
helpful but you may be able to implement this by using
> a remote-helper.  The helper could perform any sort of caching it needed
> to prevent re-downloading large amounts of data that is potentially
> thrown away, while only sending through the relevant commits which
> satisfy some criteria (signed, etc.).

This looks like a possibility, thanks! I'll likely try to implement it
this way if there can be no consensus on the features wanted for a
fetch-hook (as the current state of discussion looks like to me).

Re: Fetch-hooks

2018-02-10 Thread Leo Gaspard

On 02/10/2018 01:21 PM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote:
> 
>>> Yeah, tag-following may be a little tricky, because it usually wants to
>>> write to refs/tags/. One workaround would be to have your config look
>>> like this:
>>>
>>>   [remote "origin"]
>>>   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
>>>   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
>>>   tagOpt = --no-tags
>>>
>>> That's not exactly the same thing, because it would fetch all tags, not
>>> just those that point to the history on the branches. But in most
>>> repositories and workflows the distinction doesn't matter.
>>
>> Hmm... apart from the implementation complexity (of which I have no
>> idea), is there an argument against the idea of adding a
>> remote..fetchTagsTo refmap similar to remote..fetch but used
>> every time a tag is fetched? (well, maybe not exactly similar to
>> remote..fetch because we know the source is going to be
>> refs/tags/*; so just having the part of .fetch past the ':' would be
>> more like what's needed I guess)
> 
> I don't think it would be too hard to implement, and is at least
> logically consistent with the way we handle tags.
> 
> If we were designing from scratch, I do think this would all be helped
> immensely by having more separation of refs fetched from remotes. There
> was a proposal in the v1.8 era to fetch everything for a remote,
> including tags, into "refs/remotes/origin/heads/",
> "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to
> look for tags in each of the remote-tag namespaces.
> 
> I still think that would be a good direction to go, but it would be a
> big project (which is why the original stalled).

Hmm... would this also drown the remote..fetch map? Also, I think
this behavior could be emulated with fetch and fetchTagsTo, and it would
look like:
[remote "my-remote"]
fetch = +refs/heads/*:refs/remotes/my-remote/heads/*
fetchTagsTo = refs/remotes/my-remote/tags/*
The remaining issue being to teach the lookup side to look for tags in
all the remote-tag namespaces (and the fact it's a breaking change).

That said, actually I just noticed an issue in the “add a
remote..fetch option to fetch to refs/quarantine then have the
post-fetch hook do the work”: it means if I run `git pull`, then:
 1. The remote references will be pulled to refs/quarantine/...
 2. The post-fetch hook will copy the accepted ones to refs/remotes/...
 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
local branches... and so merge from refs/quarantine.

A solution would be to also update FETCH_HEAD in the post-fetch hook,
but then we're back to the issue raised by Junio after the “*HOWEVER*”
of [1]: the hook writer has to maintain consistency between the “copied”
references and FETCH_HEAD.

So, when thinking about it, I'm back to thinking the proper hook
interface should be the one of the tweak-fetch hook, but its
implementation should make it not go crazy on remote servers. And so
that the implementation should do all this refs/quarantine wizardry
inside git itself.

So basically what I'm getting at at the moment is that git fetch should:
 1. fetch all the references to refs/real-remotes//{insert here a
copy of the refs/ tree of }
 2. figure out what the “expected” names for these references will by,
by looking at remote..fetch (and at whether --tags was passed)
 3. for each “expected” name,
 1. if a tweak-fetch hook is present, call it with the
refs/real-remotes/... refname and the “expected end-name” from
remote..fetch
 2. based on the hook's result, potentially to move the “expected
end-name” to another commit than the one referenced by refs/real-remotes/...
 3. move the “expected” name to the commit referenced in
refs/real-remotes

Which would make the tweak-fetch hook interface simpler (though more
restrictive, but I don't have any real use case for the other change
possibilities) than it is now:
 1. feed the hook with lines of
“refs/real-remotes/my-remote/heads/my-branch
refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is
“normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag”
(if my-tag is being fetched from my-remote)
 2. read lines of “ refs/remotes/my-remote/my-branch”, that
will re-point my-branch to  instead of
refs/real-remotes/my-remote/heads/my-branch. In order to avoid any
issue, the hook is not allowed to pass as second output a reference that
was not passed as second input.

This way, the behavior of the tweak-fetch hook is reasonably preserved
(at least for my use case), and there is no additional load on the
servers thanks to the up-to-date references being stored in
refs/real-remotes//

Does this reasoning make any sense?

[1] https://marc.info/?l=git=132478296309094=2

Re: Fetch-hooks

2018-02-10 Thread Leo Gaspard

On 02/10/2018 02:33 AM, Leo Gaspard wrote:> I guess the very early part
of the discussion you're speaking of is what
> I was assuming after reading
> https://marc.info/?l=git=132478296309094=2
> 
> [...]
> 
> So the reason for a post-fetch in my opinion is the same as for a
> pre-push / pre-commit: not changing the user's workflow, while providing
> additional features.
Well, re-reading my email, it looks aggressive to me now... sorry about
that!

What I meant was basically, that in my mind pre-push or pre-commit are
also local-only things, and they are really useful in that they allow to
block the push or the commit if some conditions are not met (eg. block
commit with trailing whitespace, or block push of unsigned commits).

In pretty much the same way, what I'm really looking for is a way to
“block the fetch” (from a user-visible standpoint) -- like pre-push but
in the opposite direction. I hope such a goal is not an anti-pattern for
hook design?

In order to do this, I first tried updating this tweak-fetch hook patch
from late 2011, and then learned that it would cause too high a load on
servers. Then, while brainstorming another solution, this idea of a
notification-only post-fetch hook arose. But, as I noticed while writing
my previous answer, this suffers the same
the-hook-writer-must-correctly-update-FETCH_HEAD-concurrently-with-remote-branch
issue that you pointed out just after the “*HOWEVER*” in [1]. So that
solution is likely a bad solution too. I guess we'll continue the search
for a good solution in the side-thread with Peff, hoping to figure one out.

That being said about what I meant in my last email, obviously you're
the one who has the say on what goes in or not! And if it's an
anti-feature I'd much rather know it now than after spending a few
nights coding :)

So, what do you think about this use case I'm thinking of? (ie.
“blocking the fetch” like pre-push “blocks the push” and pre-commit
“blocks the commit”?)

[1] https://marc.info/?l=git=132478296309094=2

Re: Fetch-hooks

2018-02-09 Thread Leo Gaspard

On 02/10/2018 02:08 AM, Junio C Hamano wrote:
> Leo Gaspard <l...@gaspard.io> writes:
> 
>> On 02/10/2018 01:13 AM, Jeff King wrote:
>>> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:
>>>> So the changes that are required are:
>>>>  * Adding a notification-only post-fetch hook
> 
> Maybe I missed a very early part of the discussion, but why does
> this even need a hook?  There are some numbers [*1*] of classes of
> valid reasons we may want to have hooks triggered by operations, but
> "always do something locally after doing something else locally,
> regardless of the outcome of that something else" feels like the
> most typical anti-pattern that we do not want a hook for.  If you
> are doing "git fetch" (or "git pull"), you already know you are
> doing that and you donot need a notification.  You just create a
> workflow specific script that calls fetch or pull, followed by
> whatever you want to do and use that, instead of doing "git pull",
> and that is not any extra work than writing a hook and installing
> it.
> 
> Unlike something like post-receive, which happens on the remote side
> where you may not even have an interactive access to, in response to
> the operation you locally do (i.e. "git push"), fetching and then
> doing something else in a repository you fetch into has no reason to
> be done as a hook.

I guess the very early part of the discussion you're speaking of is what
I was assuming after reading
https://marc.info/?l=git=132478296309094=2

Then, it's always possible to just write workflow-specific scripts that
manually run git fetch then copy the refs (resp. run git fetch then copy
the refs then run git merge) to get git myfetch (resp. git mypull) [1].

But then, the question is, why is there a pre-push hook? it's already
possible to have a custom script that first runs the hook then runs git
push, for most if not all of the use cases of the pre-push hook. Yet the
possibility to not change the end-user's workflow is what makes pre-push
(or pre-commit, with which the similarity is perhaps even more obvious)
so useful.

So the reason for a post-fetch in my opinion is the same as for a
pre-push / pre-commit: not changing the user's workflow, while providing
additional features.

[1]

Which makes me notice that actually the post-fetch hook technique we
were discussing with Peff suffers the same not-updating-FETCH_HEAD issue
that was discussed in this early thread: a `git pull` would try to merge
from refs/quarantine, I guess. So things are a bit harder than we thought.

I guess the tweak-fetch hook could be left “as-is”, but with git
automatically doing the “quarantine-ing” thing transparently so that the
references that the end-user (or the hook for that matter) see are the
“curated” ones? Then it's too late for me to think efficiently right
now, so that idea may be stupid or over-complex.

Re: Fetch-hooks

2018-02-09 Thread Leo Gaspard

On 02/10/2018 01:13 AM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:
>> So the changes that are required are:
>>  * Adding a notification-only post-fetch hook
>>  * For handling tags, there is a need to have a refmap for tags. Maybe
>> adding a remote.my-remote.fetchTags refmap, that would be used when
>> running with --tags, and having it default to “refs/tags/*:refs/tags/*”
>> to keep the current behavior by default?
> 
> Yeah, tag-following may be a little tricky, because it usually wants to
> write to refs/tags/. One workaround would be to have your config look
> like this:
> 
>   [remote "origin"]
>   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
>   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
>   tagOpt = --no-tags
> 
> That's not exactly the same thing, because it would fetch all tags, not
> just those that point to the history on the branches. But in most
> repositories and workflows the distinction doesn't matter.

Hmm... apart from the implementation complexity (of which I have no
idea), is there an argument against the idea of adding a
remote..fetchTagsTo refmap similar to remote..fetch but used
every time a tag is fetched? (well, maybe not exactly similar to
remote..fetch because we know the source is going to be
refs/tags/*; so just having the part of .fetch past the ':' would be
more like what's needed I guess)

The issue with your solution is that if the user runs 'git fetch
--tags', he will get the (potentially compromised) tags directly in his
refs/tags/.

> (By the way, the I specifically chose the name "refs/quarantine" instead
> of anything in "refs/remotes" because we'd want to make sure that the
> "git checkout" DWIM behavior cannot accidentally pull from quarantine).

(Indeed, I understood after reading it, and would likely not have
thought of it otherwise, thanks!)

>> The only remaining issue I can think of is: How do we avoid the issue
>> of the
>> trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification
>> raised in the side-thread that Junio wrote in [1]? Maybe just writing
>> in the documentation that the hook should use a quarantine-like
>> approach if it wants modification would be enough to not have hook
>> authors try to modify the ref in the post-fetch hook?
> 
> I don't have a silver bullet there. Documenting the "right" way at least
> seems like a good first step.

So long as it's not a merge-blocker it's good with me! (but then I'm
likely not the one who's going to be pointed at when things go wrong in
a hook, so I'm clearly biased on this matter)

Re: Fetch-hooks

2018-02-09 Thread Leo Gaspard

On 02/09/2018 11:30 PM, Jeff King wrote:
> On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> One thing that's not discussed yet, and I know just enough about for it
>> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
>> Brandon who know more) is that this feature once shipped might cause
>> higher load on git hosting providers.
>>
>> This is because people will inevitably use it in popular projects for
>> some custom filtering, and because you're continually re-fetching and
>> inspecting stuff what used to be a really cheap no-op "pull" most of the
>> time is a more expensive negotiation every time before the client
>> rejects the refs again, and worse for hosting providers because you have
>> bespoke ref fetching strategies you have less odds of being able to
>> cache both the negotiation and the pack you serve.
> 
> Most of the discussion so far seems to be about "accept this ref or
> don't accept this ref", which seems OK. But if you are going to do
> custom tweaking like rewriting objects, or making it common to refuse
> some refs, then I think things get pretty inefficient for _everybody_.
> 
> The negotiation for future fetches uses the existing refs as the
> starting point. And if we don't know that we have the objects because
> there are no refs pointing at them, they're going to get transferred
> again. That's extra load no the server, and extra time for the user
> waiting on the network.

Oh. I thought the protocol git used was something like
client: I want to fetch refs A and B
server: so you'll need objects 12345678 and 90ABCDEF, A and B both point
to 12345678
client: please give me object 12345678
server: here it is
[...]

I was clearly wrong, thanks! (and thanks Ævar for your explanation in
the side-thread, too!)

> I tend to agree with the direction of thinking you outlined: you're
> generally better off completing the fetch to a local namespace that
> tracks the other side completely, and then manipulating the local refs
> as you see fit (e.g., fetching into refs/quarantine, and then migrating
> "good" refs over to refs/remotes/origin).

Hmm... so do I understand it correctly when I say the process you're
thinking about works like this?
 * User installs hook for my-remote by running [something]
 * User runs git fetch
 * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so
I guess [something] changes the remote.my-remote.fetch refmap)
 * When this is done `git fetch` runs a notification-only post-fetch
hook (that would need to be added)
 * The post-fetch hook then performs whatever it wants and updates the
references in refs/remotes/my-remote/*

So the changes that are required are:
 * Adding a notification-only post-fetch hook
 * For handling tags, there is a need to have a refmap for tags. Maybe
adding a remote.my-remote.fetchTags refmap, that would be used when
running with --tags, and having it default to “refs/tags/*:refs/tags/*”
to keep the current behavior by default?

The only remaining issue I can think of is: How do we avoid the issue of
the
trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification
raised in the side-thread that Junio wrote in [1]? Maybe just writing in
the documentation that the hook should use a quarantine-like approach if
it wants modification would be enough to not have hook authors try to
modify the ref in the post-fetch hook?

Thanks for all your thoughts, and hope we're getting somewhere!
Leo

PS: I'll read over the reviews once I'm all clear as to what exactly is
wanted for this patch, as most likely they'll just be dumped, given the
current state of affairs.

[1] https://marc.info/?l=git=132480559712592=2

Re: Fetch-hooks

2018-02-09 Thread Leo Gaspard

On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also
have some intermediate step between these two, where
>>> e.g. your refspec for "origin" is
>>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
>>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
>>> location, then you move them (with some alias/hook) to
>>> "refs/remotes/origin/*" once they're seen to be "OK".
>>
>> That is indeed another possibility, but then the idea is to make things
>> as transparent as possible for the end-user, not to completely change
>> their git workflow. As such, considering only signed commits to be part
>> of the upstream seems to make sense to me?
> 
> I mean this would be something that would be part of a post-fetch hook,
> so it would be as transparent as what you're doing to the user, with the
> difference that it doesn't need to involve changes to what you slurp
> down from the server.
> 
> I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run
> your post-fetch hook and you go over the new refs, and copy what you
> like (usually everything) to refs/remotes/origin/*.

Hmm... but that would then require a post-fetch hook, wouldn't it? And
about a post-fetch hook, if I understood correctly, Junio in [1] had a
quite nice argument against it:

Although I do not deeply care between such a "trigger to only
notify, no touching" hook and a full-blown "allow hook writers to
easily lie about what happened in the fetch" hook, I was hoping that
we would get this right and useful if we were spending our brain
bandwidth on it. I am not very fond of an easier "trigger to only
notify" hook because people are bound to misuse the interface and
try updating the refs anyway, making it easy to introduce
inconsistencies between refs and FETCH_HEAD that will confuse the
later "merge" step.

Otherwise, if it doesn't require a post-fetch hook, then it would
require the end-user to first fetch, then run the
`copy-trusted-refs-over` script, which would add stuff to the user's
workflow.

Did I miss another possibility?

> [...]
> 
> One thing that's not discussed yet, and I know just enough about for it
> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
> Brandon who know more) is that this feature once shipped might cause
> higher load on git hosting providers.
> 
> This is because people will inevitably use it in popular projects for
> some custom filtering, and because you're continually re-fetching and
> inspecting stuff what used to be a really cheap no-op "pull" most of the
> time is a more expensive negotiation every time before the client
> rejects the refs again, and worse for hosting providers because you have
> bespoke ref fetching strategies you have less odds of being able to
> cache both the negotiation and the pack you serve.
> 
> I.e. you want this for some security feature where 99.99% of the time
> you accept all refs, but most people will probably use this to implement
> dynamic Turing-complete refspecs.
> 
> Maybe that's worrying about nothing, but worth thinking about.

Well... First, I must say I didn't really understand your last paragraph
about Turing-complete refspecs.

But my understanding of how the fetch-hook patchset I sent this evening
works is that it first receives all the objects from the hosting
provider, then locally moves the refs, but never actually discards the
downloaded objects (well, until a `git gc` I guess).

So I don't think the network traffic with the provider would be any
different wrt. what it is now, even if a tweak-fetch hook rejects some
commits? Then again I don't know git's internals enough to be have even
a bit of certainty about what I'm saying right now, so...

[1] https://marc.info/?l=git=132480559712592=2

[PATCH 1/2] fetch: preparations for tweak-fetch hook

2018-02-09 Thread Leo Gaspard

From: Léo Gaspard <l...@gaspard.io>

No behavior changes yet, only some groundwork for the next change.

The refs_result structure combines a status code with a ref map, which
can be NULL even on success. This will be needed when there's a
tweak-fetch hook, because it can filter out all refs, while still
succeeding.

fetch_refs returns a refs_result, so that it can modify the ref_map.

Based-on-patch-by: Joey Hess <j...@kitenet.net>
Signed-off-by: Leo Gaspard <l...@gaspard.io>
---
 builtin/fetch.c | 68 +
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7bbcd26fa..76dc05f61 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -34,6 +34,11 @@ enum {
TAGS_SET = 2
 };
 
+struct refs_result {
+   struct ref *new_refs;
+   int status;
+};
+
 static int fetch_prune_config = -1; /* unspecified */
 static int prune = -1; /* unspecified */
 #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */
@@ -57,6 +62,18 @@ static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
 
+static int add_existing(const char *refname, const struct object_id *oid,
+   int flag, void *cbdata)
+{
+   struct string_list *list = (struct string_list *)cbdata;
+   struct string_list_item *item = string_list_insert(list, refname);
+   struct object_id *old_oid = xmalloc(sizeof(*old_oid));
+
+   oidcpy(old_oid, oid);
+   item->util = old_oid;
+   return 0;
+}
+
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
if (!strcmp(k, "fetch.prune")) {
@@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head,
}
 }
 
-static int add_existing(const char *refname, const struct object_id *oid,
-   int flag, void *cbdata)
-{
-   struct string_list *list = (struct string_list *)cbdata;
-   struct string_list_item *item = string_list_insert(list, refname);
-   struct object_id *old_oid = xmalloc(sizeof(*old_oid));
-
-   oidcpy(old_oid, oid);
-   item->util = old_oid;
-   return 0;
-}
-
 static int will_fetch(struct ref **head, const unsigned char *sha1)
 {
struct ref *rm = *head;
@@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map)
return check_connected(iterate_ref_map, , );
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static struct refs_result fetch_refs(struct transport *transport,
+   struct ref *ref_map)
 {
-   int ret = quickfetch(ref_map);
-   if (ret)
-   ret = transport_fetch_refs(transport, ref_map);
-   if (!ret)
-   ret |= store_updated_refs(transport->url,
+   struct refs_result ret;
+   ret.status = quickfetch(ref_map);
+   if (ret.status) {
+   ret.status = transport_fetch_refs(transport, ref_map);
+   }
+   if (!ret.status) {
+   ret.new_refs = ref_map;
+   ret.status |= store_updated_refs(transport->url,
transport->remote->name,
-   ref_map);
+   ret.new_refs);
+   }
transport_unlock_pack(transport);
return ret;
 }
@@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
return transport;
 }
 
-static void backfill_tags(struct transport *transport, struct ref *ref_map)
+static struct refs_result backfill_tags(struct transport *transport,
+   struct ref *ref_map)
 {
int cannot_reuse;
+   struct refs_result res;
 
/*
 * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it
@@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   fetch_refs(transport, ref_map);
+   res = fetch_refs(transport, ref_map);
 
if (gsecondary) {
transport_disconnect(gsecondary);
gsecondary = NULL;
}
+
+   return res;
 }
 
 static int do_fetch(struct transport *transport,
@@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport,
struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct ref *ref_map;
struct ref *rm;
+   struct refs_result res;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
 
@@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map)) {
+
+   res = fetch_refs(transport,

[PATCH 2/2] fetch: add tweak-fetch hook

2018-02-09 Thread Leo Gaspard

From: Léo Gaspard <l...@gaspard.io>

The tweak-fetch hook is fed lines on stdin for all refs that were
fetched, and outputs on stdout possibly modified lines. Its output is
then parsed and used when `git fetch` updates the remote tracking refs,
records the entries in FETCH_HEAD, and produces its report.

The modifications here are heavily based on prior work by Joey Hess.

Based-on-patch-by: Joey Hess <j...@kitenet.net>
Signed-off-by: Leo Gaspard <l...@gaspard.io>
---
 Documentation/githooks.txt  |  37 +++
 builtin/fetch.c | 210 +++-
 t/t5574-fetch-tweak-fetch-hook.sh   |  90 
 templates/hooks--tweak-fetch.sample |  24 +
 4 files changed, 359 insertions(+), 2 deletions(-)
 create mode 100755 t/t5574-fetch-tweak-fetch-hook.sh
 create mode 100755 templates/hooks--tweak-fetch.sample

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index f877f7b7c..1b4a18bf0 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -177,6 +177,43 @@ This hook can be used to perform repository validity 
checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+tweak-fetch
+~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs
+have been fetched from the remote repository. It is not executed, if nothing 
was
+fetched.
+
+The output of the hook is used to update the remote-tracking branches, and
+`.git/FETCH_HEAD`, in preparation for a later merge operation done by 'git
+merge'.
+
+It takes no arguments, but is fed a line of the following format on its 
standard
+input for each ref that was fetched.
+
+   SP not-for-merge|merge|ignore SP  SP  
LF
+
+Where the "not-for-merge" flag indicates the ref is not to be merged into the
+current branch, and the "merge" flag indicates that 'git merge' should later
+merge it.
+
+The `` is the remote's name for the ref that was fetched, and
+`` is a name of a remote-tracking branch, like
+"refs/remotes/origin/master". `` can be undefined if the fetched
+ref is not being stored in a local refname. In this case, it will be set to 
`@`,
+an invalide refspec, so that scripts can be written more easily.
+
+TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not
+really sure I get what this does or what invariants it is supposed to maintain
+(eg. all “ignore” updates at the end of the refs list?), so this may also
+require code changes.
+
+The hook must consume all of its standard input, and output back lines of the
+same format. It can modify its input as desired, including adding or removing
+lines, updating the sha1 (i.e. re-point the remote-tracking branch), changing
+the merge flag, and changing the `` (i.e. use different
+remote-tracking branch).
+
 post-merge
 ~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 76dc05f61..1bb394530 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -28,6 +28,8 @@ static const char * const builtin_fetch_usage[] = {
NULL
 };
 
+static const char tweak_fetch_hook[] = "tweak-fetch";
+
 enum {
TAGS_UNSET = 0,
TAGS_DEFAULT = 1,
@@ -181,6 +183,206 @@ static struct option builtin_fetch_options[] = {
OPT_END()
 };
 
+static int feed_tweak_fetch_hook(int in, int out, void *data)
+{
+   struct ref *ref;
+   struct strbuf buf = STRBUF_INIT;
+   const char *kw, *peer_ref;
+   char oid_buf[GIT_SHA1_HEXSZ + 1];
+   int ret;
+
+   for (ref = data; ref; ref = ref->next) {
+   if (ref->fetch_head_status == FETCH_HEAD_MERGE)
+   kw = "merge";
+   else if (ref->fetch_head_status == FETCH_HEAD_IGNORE)
+   kw = "ignore";
+   else
+   kw = "not-for-merge";
+   if (!ref->name)
+   die("trying to fetch an inexistant ref");
+   if (ref->peer_ref && ref->peer_ref->name)
+   peer_ref = ref->peer_ref->name;
+   else
+   peer_ref = "@";
+   strbuf_addf(, "%s %s %s %s\n",
+   oid_to_hex_r(oid_buf, >old_oid), kw,
+   ref->name, peer_ref);
+   }
+
+   ret = write_in_full(out, buf.buf, buf.len) != buf.len;
+   if (ret)
+   warning("%s hook failed to consume all its input",
+   tweak_fetch_hook);
+   close(out);
+   strbuf_release();
+   return ret;
+}
+
+static struct ref *parse_tweak_fetch_hook_line(char *l,
+   struct string_list *existing_refs)
+{
+   struct ref *ref = NULL, *peer_ref = NULL;
+   struct string_list_item *peer_item = NULL;
+   c

[PATCH 0/2] fetch: add tweak-fetch hook

2018-02-09 Thread Leo Gaspard

On 02/09/2018 09:20 PM, Joey Hess wrote:> Yes; my patches are under the
same GPL-2 as the rest of git.

Thanks! So here comes my patch series, heavily based on yours.

There are some things to bear in mind while reviewing it:

 * This is my first real attempt at contributing to git, which means I
could be very very far off-track

 * Most of it is based on trying to make the 6-year-old patch series
work and pass all the tests, so if a new feature has been added since
then I likely didn't notice it or don't know how to handle it correctly

There are still three TODO's in the code:

 * In the documentation, one stating that I don't really get what this
“ignore” parameter exactly does, and whether it should be handled
specially (a prime example of a new feature I'm not really sure how to
handle, somewhere in the code it's written all the “ignore” references
are at the end of the list, but I'm already not self-confident enough
about the difference between “merge” and “not-for-merge” to even
consider making a good choice about how to handle “ignore”)

 * In `builtins/fetch.c`, function `do_fetch`, there is a conflict of
interest between placing the `prune` before the `fetch` (as done by
commit 10a6cc889 ("fetch --prune: Run prune before fetching",
2014-01-02)), and placing the `fetch` before the `prune` (which would
allow hooks that rename the local-ref to not be prune'd and then
re-fetched when doing a `git fetch --prune` -- without that a hook that
would want to both read the old commit information and rename the
local-ref would not be able to). Or maybe this question means actually
there should be a third solution? but I don't really know what. Maybe
also hooking into the prune operation?

 * In `templates/hooks--tweak-fetch.sample`, the “check this actually
works” todo, as I'd rather first check this patch series is not too far
off-topic before doing non-essential work -- anyway another version of
the patch series will be required for the other two TODO's, so I can fix
it at this point.

That being said, what do you think about these patches?

Thanks for your time!
Leo Gaspard

Re: Fetch-hooks

2018-02-09 Thread Leo Gaspard

On 02/08/2018 06:02 PM, Leo Gaspard wrote:
> On 02/08/2018 04:30 PM, Joey Hess wrote:
>> [...]
> 
> Hmm, OK, so I guess I'll try to update the patch when I get some time to
> delve into git's internals, as my use case (forbidding some fetches)
> couldn't afaik be covered by a wrapper hook.

Joey,

I just wanted to check, you did not put the Signed-off-by line in
patches in https://marc.info/?l=git=132491485901482=2

Could you confirm that the patches you sent are “covered under an
appropriate open source license and I have the right under that license
to submit that work with modifications, whether created in whole or in
part by me, under the same open source license (unless I am permitted to
submit under a different license), as indicated in the file”, so that I
could send the patch I wrote based on yours with a Signed-off-by line of
my own without breaking the DCO?

Thanks!
Leo

Re: Fetch-hooks

2018-02-08 Thread Leo Gaspard

On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I
guess I'll try to update the patch when I get some time to
>> delve into git's internals, as my use case (forbidding some fetches)
>> couldn't afaik be covered by a wrapper hook.
> 
> Per my reading of
> https://public-inbox.org/git/20111224234212.ga21...@gnu.kitenet.net/
> what Joey implemented is not what you described in your initial mail.
> 
> His is a *post*-fetch hook, we've already done the fetch and are just
> telling you as a courtesy what refs changed. You could also implement
> this as some cronjob that polls git for-each-ref but it's easier as a
> hook, fine.

I was thinking along the lines of
https://marc.info/?l=git=132486687023893=2
with high-level description at
https://marc.info/?l=git=132480559712592=2

With the high-level description given here, I'm pretty sure I can hack a
hook together to make things work as I want them to.

> What you're describing is something like a pre-fetch hook analogous to
> the pre-receive hooks, where you're fed refs updated on the remote on
> stdin, and can say you don't want some of those to be updated.
> 
> This may just be a lack of imagination on my part, but I don't see how
> that's sensible at all.
> 
> The refs we fetch are our *copy* of the remote refs, why would you not
> want to track the upstream remote. You're going to refuse some branches
> and what? Be further behind until some point in the future where the tip
> is GPG-signed and you accept it, at which poich you'll need to do more
> work than if you were up-to-date with the almost-GPG-signed version?

That's about it. I want all fetching to be blocked in case of the tip
not being signed. As there is a pre-push hook ensuring committers don't
forget to sign before pushing, the only case the tip could not be signed
is in case of an attack, which means it's better to just force-push
master because any git repo that fetched it is doomed anyway. Definitely
would not want to allow an untrusted revision get into anything that
could even remotely be taken as “endorsed” by the user.

(BTW, in order to avoid the case of someone forgetting to sign the
commit and not having installed the pre-push hook, there can be holes in
the commit-signing chain, the drawback being that the committer pushing
a signed commit takes responsibility for all unsigned commits directly
preceding his -- allowing them to recover in case of a mistaken push)

> I think you're confusing two things here. One is the reasonable concern
> of wanting to not have your local copy of remote refs have undesirable
> content, but a pre-fetch hook is not the way to accomplish that.

Well, a pre-fetch hook is a possible way of accomplishing that, and I
don't know of any better one?

> The other is e.g. to ensure that you never locally check out some "bad"
> ref, we don't have hook support for that, but could add it,
> e.g. git-checkout and git reset --hard could be taught about some
> pre-checkout hook.

Issue is, once we have to fix checkout and reset, all other commands
that potentially touch the worktree also have to be fixed (eg. I don't
know whether worktree add triggers pre-checkout?)

Also, this requires the hook to store a database of all the paths that
have been checked, because there is no logic in how one may choose to
checkout the repo. While having a tweak-fetch hook would make the
implementation straightforward, because at the time of invoking the hook
the “refname at remote” commit is already trusted, and the “object name”
is the commit whose validity we want to check, so we just have to check
the path between those two. (I don't know if you checked my current
scripts, but basically as the set of allowed PGP keys can change at any
commit, it's only possible to check a commit path, not a single commit
out-of-nowhere)

The only issue that could arise with a tweak-fetch hook is in case of a
force-fetch (and even maybe it's not even an actual issue, I haven't
given it real thought yet), but this can reasonably be banned, as once a
commit is signed it enters the “real” master branch, that should never
be moved backward, as it can't be the sign of an attack.

> You could also have some intermediate step between these two, where
> e.g. your refspec for "origin" is
> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
> location, then you move them (with some alias/hook) to
> "refs/remotes/origin/*" once they're seen to be "OK".

That is indeed another possibility, but then the idea is to make things
as transparent as possible for the end-user, not to completely change
their git workflow. As such, considering only signed commits to be part
of the upstream seems to make sense to me?

Cheers,
Leo

Re: Fetch-hooks

2018-02-08 Thread Leo Gaspard

On 02/08/2018 04:30 PM, Joey Hess wrote:
> Leo Gaspard wrote:
>> That said, I just came upon [1] (esp. the description [2] and the patch
>> [3]), and wondered: it looks like the patch was abandoned midway in
>> favor of a hook refactoring. Would you happen to know whether the hook
>> refactoring eventually took place, and/or whether this patch was
>> resubmitted later, and/or whether it would still be possible to merge
>> this now? (not having any experience with git's internals yet, I don't
>> really know whether these are stupid questions or not)
>>
>> PS: Cc'ing Joey, as you most likely know best what eventually happened,
>> if you can remember it?
> 
> I don't remember it well, but reviewing the thread, I think it foundered
> on this comment by Junio:
> 
>> That use case sounds like that "git fetch" is called as a first class UI,
>> which is covered by "git myfetch" (you can call it "git annex fetch")
>> wrapper approach, the canonical example of a hook that we explicitly do
> ^
>> not want to add.
>   ^^^
> 
> While I still think a fetch hook would be a good idea for reasons of
> composability, I then just went off and implemented such a wrapper for
> my own particular use case, and the wrapper program then grew to cover
> use cases that a hook would not have been able to cover, so ...

Hmm, OK, so I guess I'll try to update the patch when I get some time to
delve into git's internals, as my use case (forbidding some fetches)
couldn't afaik be covered by a wrapper hook.

Thanks for the feedback!
Leo

Re: Fetch-hooks

2018-02-07 Thread Leo Gaspard

On 02/07/2018 11:51 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 07 2018, Leo Gaspard jotted:
> 
>> Hello,
>>
>> tl;dr: Is there currently a way to have fetch hooks, and if not do you
>> think it could be a nice feature?
>>
>> I was in the process of implementing hooks for git that ensure the
>> repository is always cleanly signed by someone allowed to by the
>> repository itself. I think I've completed the signature-checking part
>> [1] and the push hook [2] (even though it isn't really configurable at
>> the moment).
>>
>> However, I was starting to think about handling the fetch step, and
>> couldn't find any fetch hook. Is there one?
>>
>> If not, would you think it is would be a good idea to add one, that
>> would eg. be passed the commit-before, commit-after and could block the
>> changing of the reference if it failed?
>>
>> The only other solution I could think of is using a separate script for
>> fetching, but that would be fragile, as the user could always not think
>> about it well and run a git fetch, breaking the objective that after the
>> first clone all commits were correctly signature-checked.
>>
>> Thanks for reading me!
>> Leo
>>
>> PS1: I am not subscribed to the ML.
>>
>> PS2: I've tried asking freenode#git, without success so far.
>>
>>
>> [1]
>> https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh
>>
>> [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push
> 
> There is no fetch hook, however you may find that the
> post-{checkout,merge} hooks are suitable for what you want to do.
> 
> Setting those to some custom comand is a common pattern for
> e.g. compiling some assets on "git pull", so you could similarly check
> the commits from HEAD, of course those are post-* hooks, so they won't
> stop the checkout.

Hmm, I don't think these would fit the bill. For post-merge, simply
because I spend my life rebasing stuff around, and very rarely merge.
For post-checkout, it could work, but then I'd need to keep track
manually of up to where the commits have been checked and to search the
git graph for the latest checked ancestor (as otherwise checking-out
another branch then checking-out the first branch again would likely
trigger a failure, due to the keyring being dynamic), so it would likely
be a dealbreaker, due to the hook becoming too complex to be trusted.

(Just in case you wonder, by “the keyring being dynamic” I mean the PGP
keys allowed to sign commits are stored directly inside the git repository)

That said, I just came upon [1] (esp. the description [2] and the patch
[3]), and wondered: it looks like the patch was abandoned midway in
favor of a hook refactoring. Would you happen to know whether the hook
refactoring eventually took place, and/or whether this patch was
resubmitted later, and/or whether it would still be possible to merge
this now? (not having any experience with git's internals yet, I don't
really know whether these are stupid questions or not)

Thanks!
Leo

PS: Cc'ing Joey, as you most likely know best what eventually happened,
if you can remember it?

[1] https://marc.info/?t=13247704151=1=2

[2] https://marc.info/?l=git=132483581218382=2

[3] https://marc.info/?l=git=132486687023893=2

Fetch-hooks

2018-02-07 Thread Leo Gaspard

Hello,

tl;dr: Is there currently a way to have fetch hooks, and if not do you
think it could be a nice feature?

I was in the process of implementing hooks for git that ensure the
repository is always cleanly signed by someone allowed to by the
repository itself. I think I've completed the signature-checking part
[1] and the push hook [2] (even though it isn't really configurable at
the moment).

However, I was starting to think about handling the fetch step, and
couldn't find any fetch hook. Is there one?

If not, would you think it is would be a good idea to add one, that
would eg. be passed the commit-before, commit-after and could block the
changing of the reference if it failed?

The only other solution I could think of is using a separate script for
fetching, but that would be fragile, as the user could always not think
about it well and run a git fetch, breaking the objective that after the
first clone all commits were correctly signature-checked.

Thanks for reading me!
Leo

PS1: I am not subscribed to the ML.

PS2: I've tried asking freenode#git, without success so far.


[1]
https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh

[2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push

Re: Migrating away from SHA-1?

2016-06-17 Thread Leo Gaspard

First, sorry for not having this message threaded: I'm not subscribed to
the list and haven't found a way to get a Message-Id from gmane.

I just wanted to ask, as an end-user highly relying on commit
signatures, a few questions as to the migration away from SHA-1.

SHA-1 already suffers from a freestart collision attack. Based on what I
understand of the object model of git, a chosen-prefix collision attack
(perhaps somewhat improved) is enough to make reviewers accept a patch,
sign it, and then swap the innocuous-looking patch for an evil-doing one
-- which *will be signed*.

As for the issue about code checking being an easier entrypoint
(Theodore Ts'o, 2016-04-14 22:40:51 GMT), in a use case of mine there is
a repo with my dotfiles on an untrusted server. Yet I download them and
am able to execute them without fear because each commit is PGP-signed
with my key. The point being that code checking is not even a possible
entrypoint in some cases, so SHA-1 seems to be(come) the weakest link.

So, I don't think it is possible to disagree with Jeff King when he
wrote his 2016-04-12 23:15:19 GMT email.

Peter Anvin (2016-04-14 17:28:50 GMT) gets a point in that there is no
need to hurry (chosen-prefix collisions may be still quite a long way,
even though there is no guesswork in these matters), and quality is
important. Yet Jeff King's proposal (2016-04-12 23:42:52 GMT), amended
by Junio Hamano (2016-04-13 01:03:02 GMT) and himself (2016-04-13
01:36:32 GMT) seem to have met no opposition.

So, my questions to the git team:
 * Is there a consensus, that git should migrate away from SHA-1 before
it gets a collision attack, because it would mean chosen-prefix
collision isn't far away and people wouldn't have the time to upgrade?
 * Is there a consensus, that Peter Anvin's amended transition plan is
the way to go?
 * If the two conditions above are fulfilled, has work started on it
yet? (I guess as Brian Carlson had started his work 9 weeks ago and he
was speaking about working on it on the week-end he should have finished
it now, so excluding this)
 * If the two first conditions are fulfilled, is there anything I could
do to help this transition? (including helping Brian if his work hasn't
actually ended yet)

Sorry for bringing up again a subject that seems to be quite recurrent,
and for this long block of text,
Leo Gaspard



signature.asc
Description: OpenPGP digital signature

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

[PATCH 1/2] fetch: preparations for tweak-fetch hook

[PATCH 2/2] fetch: add tweak-fetch hook

[PATCH 0/2] fetch: add tweak-fetch hook

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Re: Fetch-hooks

Fetch-hooks

Re: Migrating away from SHA-1?

19 matches

Site Navigation

Mail list logo

Footer information