[PATCH] commit: abort before commit-msg if empty message

2018-12-07 Thread Jonathan Tan
() upwards, which seems more complicated.) Signed-off-by: Jonathan Tan --- This was noticed with the commit-msg hook that comes with Gerrit, which basically just calls git interpret-trailers to add a Change-Id trailer. --- builtin/commit.c | 43 -

[PATCH on master v2] revision: use commit graph in get_reference()

2018-12-07 Thread Jonathan Tan
ng the object signature of the returned object. Signed-off-by: Jonathan Tan --- This patch is now on master. v2 makes use of the optimization Stolee describes in [1], except that I have arranged the functions slightly differently. In particular, I didn't want to add even more ways to obtain obje

Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-06 Thread Jonathan Tan
Also CC-ing Stolee since I mention multi-pack indices at the end. > This seems like a reasonable thing to do, but I have sort of a > meta-comment. In several places we've started doing this kind of "if > it's this type of object, do X, otherwise do Y" optimization (e.g., > handling large blobs

Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-06 Thread Jonathan Tan
> > This is on sb/more-repo-in-api because I'm using the repo_parse_commit() > > function. > > This is a mere nicety, not strictly required. > Before we had parse_commit(struct commit *) which would accomplish the > same, (and we'd still have that afterwards as a #define falling back onto >

Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line

2018-12-06 Thread Jonathan Tan
> Jonathan Tan writes: > > > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct > > output_state *os) > > } > > os->used += readsz; > > > > + if (!os->packfile_started) { > > + os->

Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-06 Thread Jonathan Tan
> > +This feature allows servers to serve part of their packfile response as > > URIs. > > +This allows server designs that improve scalability in bandwidth and CPU > > usage > > +(for example, by serving some data through a CDN), and (in the future) > > provides > > +some measure of

Re: [WIP RFC 1/5] Documentation: order protocol v2 sections

2018-12-06 Thread Jonathan Tan
> > The git command line expects Git servers to follow a specific order of > > "Command line"? It sounds like you are talking about the order of > command line arguments and options, but apparently that is not what > you are doing. Is it "The git over-the-wire protocol"? I meant to say the

Re: [PATCH 9/9] fetch: try fetching submodules if needed objects were not fetched

2018-12-04 Thread Jonathan Tan
> Try fetching a submodule by object id if the object id that the > superproject points to, cannot be found. Mention here the consequences of what happens when this attempt to fetch fails. Also, this seems to be a case of "do or do not, there is no try" - maybe it's better to say "Fetch commits

Re: [PATCH 8/9] submodule.c: fetch in submodules git directory instead of in worktree

2018-12-04 Thread Jonathan Tan
> Keep the properties introduced in 10f5c52656 (submodule: avoid > auto-discovery in prepare_submodule_repo_env(), 2016-09-01), by fixating > the git directory of the submodule. This is to avoid the autodetection of the Git repository, making it less error-prone; looks good to me.

Re: [PATCH 7/9] submodule: migrate get_next_submodule to use repository structs

2018-12-04 Thread Jonathan Tan
> We used to recurse into submodules, even if they were broken having > only an objects directory. The child process executed in the submodule > would fail though if the submodule was broken. This is tested via > "fetching submodule into a broken repository" in t5526. > > This patch tightens the

Re: [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it

2018-12-04 Thread Jonathan Tan
> We can string_list_insert() to maintain sorted-ness of the > list as we find new items, or we can string_list_append() to > build an unsorted list and sort it at the end just once. > > As we do not rely on the sortedness while building the > list, we pick the "append and sort at the end" as it

[PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-04 Thread Jonathan Tan
ng the object signature of the returned object. Signed-off-by: Jonathan Tan --- This is on sb/more-repo-in-api because I'm using the repo_parse_commit() function. A colleague noticed this issue when handling a mirror clone. Looking at the bigger picture, the speed of the connectivity check during

Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-04 Thread Jonathan Tan
> Some thoughts here: > > First, I'd like to see a section (and a bit in the implementation) > requiring HTTPS if the original protocol is secure (SSH or HTTPS). > Allowing the server to downgrade to HTTP, even by accident, would be a > security problem. > > Second, this feature likely should be

[WIP RFC 0/5] Design for offloading part of packfile response to CDN

2018-12-03 Thread Jonathan Tan
the whole fetch, not just at the end. Jonathan Tan (5): Documentation: order protocol v2 sections Documentation: add Packfile URIs design doc upload-pack: refactor reading of pack-objects out upload-pack: refactor writing of "packfile" line upload-pack: send part of packfile respo

[WIP RFC 1/5] Documentation: order protocol v2 sections

2018-12-03 Thread Jonathan Tan
The git command line expects Git servers to follow a specific order of sections when transmitting protocol v2 responses, but this is not explicit in the documentation. Make the order explicit. Signed-off-by: Jonathan Tan --- Documentation/technical/protocol-v2.txt | 18 -- 1

[WIP RFC 5/5] upload-pack: send part of packfile response as uri

2018-12-03 Thread Jonathan Tan
ed to show that the appropriate URI is indeed transmitted, and that the returned packfile is lacking exactly the expected object. Signed-off-by: Jonathan Tan --- builtin/pack-objects.c | 48 ++ fetch-pack.c | 9 t/t5702-protocol-

[WIP RFC 4/5] upload-pack: refactor writing of "packfile" line

2018-12-03 Thread Jonathan Tan
he writing of the "packfile" section header to read_pack_objects_stdout(). Unfortunately, this also means that we cannot send keepalive packets until pack-objects starts sending out the packfile, since the sideband is only established when the "packfile" section header is sent. Signed-off

[WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-03 Thread Jonathan Tan
Signed-off-by: Jonathan Tan --- Documentation/technical/packfile-uri.txt | 83 Documentation/technical/protocol-v2.txt | 6 +- 2 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 Documentation/technical/packfile-uri.txt diff --git a/Documentation

[WIP RFC 3/5] upload-pack: refactor reading of pack-objects out

2018-12-03 Thread Jonathan Tan
buffered" local variable with a buffer array. Signed-off-by: Jonathan Tan --- upload-pack.c | 80 +-- 1 file changed, 46 insertions(+), 34 deletions(-) diff --git a/upload-pack.c b/upload-pack.c index 5e81f1ff24..cec43e0a46 100644 --- a/upload

Re: [PATCHv3 00/23] Bring more repository handles into our code base

2018-11-15 Thread Jonathan Tan
> Please have a look at the last 4 patches specifically as they were new in > the last iteration (but did not receive any comment), as they demonstrate > and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1 > for the test suite. Thanks. I only reviewed patches 18 and 20-23,

Re: [PATCH 18/23] submodule: use submodule repos for object lookup

2018-11-15 Thread Jonathan Tan
> +/* > + * Initialize 'out' based on the provided submodule path. > + * > + * Unlike repo_submodule_init, this tolerates submodules not present > + * in .gitmodules. This function exists only to preserve historical behavior, > + * > + * Returns 0 on success, -1 when the submodule is not present.

Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-12 Thread Jonathan Tan
> +index.recordOffsetTable:: > + Specifies whether the index file should include an "Index Entry > + Offset Table" section. This reduces index load time on > + multiprocessor machines but produces a message "ignoring IEOT > + extension" when reading the index using Git versions

Re: [PATCH 1/2] ls-remote: do not send ref prefixes for patterns

2018-11-08 Thread Jonathan Tan
> Jeff King writes: > > > Since b4be74105f (ls-remote: pass ref prefixes when requesting a > > remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo", > > "refs/tags/foo", etc to the transport code in an attempt to let the > > other side reduce the size of its advertisement. > >

Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-26 Thread Jonathan Tan
> Thanks for confirming. > > So even better would be to use `is_promisor_object(oid) || > has_object_file(oid)`, right? > > This is something that is probably not even needed: as I mentioned, the > shallow commits are *expected* to be local. It should not ever happen that > they are fetched.

Re: [PATCH 10/10] builtin/fetch: check for submodule updates in any ref update

2018-10-26 Thread Jonathan Tan
> Gerrit, the code review tool, has a different workflow than our mailing > list based approach. Usually users upload changes to a Gerrit server and > continuous integration and testing happens by bots. Sometimes however a > user wants to checkout a change locally and look at it locally. For this

Re: [PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched

2018-10-26 Thread Jonathan Tan
> But this default fetch is not sufficient, as a newly fetched commit in > the superproject could point to a commit in the submodule that is not > in the default refspec. This is common in workflows like Gerrit's. > When fetching a Gerrit change under review (from refs/changes/??), the > commits

Re: [PATCH 07/10] submodule: migrate get_next_submodule to use repository structs

2018-10-26 Thread Jonathan Tan
> We used to recurse into submodules, even if they were broken having > only an objects directory. The child process executed in the submodule > would fail though if the submodule was broken. > > This patch tightens the check upfront, such that we do not need > to spawn a child process to find

Re: [PATCH 06/10] repository: repo_submodule_init to take a submodule struct

2018-10-26 Thread Jonathan Tan
> diff --git a/builtin/grep.c b/builtin/grep.c > index 7da8fef31a..ba7634258a 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -418,7 +418,10 @@ static int grep_submodule(struct grep_opt *opt, struct > repository *superproject, > const struct object_id *oid, >

Re: [PATCH 05/10] submodule: store OIDs in changed_submodule_names

2018-10-26 Thread Jonathan Tan
> Reviewed-by: Jonathan Tan Probably better not to include such lines, especially since the review by me is not yet complete. Having said that, patches 1-5 look good to me. Patches 1-3 are identical to the previous version, which I have already reviewed. In patch 4, Stefan made the code cha

Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-25 Thread Jonathan Tan
> On Wed, 24 Oct 2018, Johannes Schindelin wrote: > > > On Wed, 24 Oct 2018, Junio C Hamano wrote: > > > > > Jonathan, do you see any issues with the use of lookup_commit() in > > > this change wrt lazy clone? I am wondering what happens when the > > > commit in question is at, an immediate

Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched

2018-10-23 Thread Jonathan Tan
> > Another thing you need to clarify is what happens if the fetch-by-commit > > fails. Right now, it seems that it will make the whole thing fail, which > > might be a surprising change in behavior. > > But a positive surprise, I would assume? Whether positive or negative, I think that this

Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-23 Thread Jonathan Tan
> > Why does GIT_DIR need to be set? Is it to avoid subcommands recursively > > checking the parent directories in case the CWD is a malformed Git > > repository? If yes, maybe it's worth adding a comment. > > It is copying the structure from prepare_submodule_repo_env, > specifically 10f5c52656

[PATCH] fetch-pack: be more precise in parsing v2 response

2018-10-19 Thread Jonathan Tan
eceived in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" me

Re: [PATCH 19/19] submodule: don't add submodule as odb for push

2018-10-19 Thread Jonathan Tan
> In push_submodule(), because we do not actually need access to objects > in the submodule, do not invoke add_submodule_odb(). > (for_each_remote_ref_submodule() does not require access to those > objects, and the actual push is done by spawning another process, > which handles object access by

Re: [PATCH 18/19] submodule: use submodule repos for object lookup

2018-10-19 Thread Jonathan Tan
> This converts the 'show_submodule_header' function to use > the repository API properly, such that the submodule objects > are not added to the main object store. There is also a side effect in that the submodule now needs to pass all the checks done by repo_init() instead of merely having the

Re: [RFC PATCH 0/2] Bring the_repository into cmd_foo

2018-10-18 Thread Jonathan Tan
> > Or instead we could accelerate the long term plan of removing a > > hard coded the_repository and have each cmd builtin take an additional > > repository pointer from the init code, such that we'd bring all of Git to > > work on arbitrary repositories. Then the standard test suite should be >

[PATCH v2 2/3] upload-pack: make want_obj not global

2018-10-18 Thread Jonathan Tan
in upload_pack_v2() is static to preserve existing behavior; this is not necessary in upload_pack() because upload_pack() is only invoked once per process. Signed-off-by: Jonathan Tan --- upload-pack.c | 116 -- 1 file changed, 66 insertions(+), 50 deletions

[PATCH v2 0/3] Clear flags before each v2 request

2018-10-18 Thread Jonathan Tan
t does not pass even > with s/seq/test_&/, so there may be something else wrong with it, > though. Thanks - there was a copy-and-paste error (should have grepped for "fetch< version 2", not "git< version 2"). Jonathan Tan (3): upload-pack: make have_obj no

[PATCH v2 1/3] upload-pack: make have_obj not global

2018-10-18 Thread Jonathan Tan
in upload_pack_v2() is static to preserve existing behavior; this is not necessary in upload_pack() because upload_pack() is only invoked once per process. Signed-off-by: Jonathan Tan --- upload-pack.c | 58 --- 1 file changed, 32 insertions(+), 26 deletions

[PATCH v2 3/3] upload-pack: clear flags before each v2 request

2018-10-18 Thread Jonathan Tan
only used in non-v2. Signed-off-by: Jonathan Tan --- t/t5702-protocol-v2.sh | 25 + upload-pack.c | 13 + 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index 88a886975d..4adc4b00e3

Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches

2018-10-17 Thread Jonathan Tan
> @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, > const char *remote_name, > rc |= update_local_ref(ref, what, rm, , > summary_width); > free(ref); > -

Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched

2018-10-17 Thread Jonathan Tan
> Currently when git-fetch is asked to recurse into submodules, it dispatches > a plain "git-fetch -C " (with some submodule related options > such as prefix and recusing strategy, but) without any information of the > remote or the tip that should be fetched. > > This works surprisingly well in

Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-17 Thread Jonathan Tan
> This patch started as a refactoring to make 'get_next_submodule' more > readable, but upon doing so, I realized that "git fetch" of the submodule > actually doesn't need to be run in the submodules worktree. So let's run > it in its git dir instead. The commit message needs to be updated, I

Re: [PATCH 6/9] repository: repo_submodule_init to take a submodule struct

2018-10-17 Thread Jonathan Tan
> When constructing a struct repository for a submodule for some revision > of the superproject where the submodule is not contained in the index, > it may not be present in the working tree currently either. In that > siutation giving a 'path' argument is not useful. Upgrade the >

Re: [PATCH 5/9] submodule.c: do not copy around submodule list

2018-10-17 Thread Jonathan Tan
> By doing so we'll have access to the util pointer for longer that > contains the commits that we need to fetch, which will be > useful in a later patch. It seems that the main point of this patch is so that the OIDs be stored in changed_submodule_names, because you need them in a later patch.

Re: [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct

2018-10-17 Thread Jonathan Tan
> The `changed_submodule_names` are only used for fetching, so let's make it > part of the struct that is passed around for fetching submodules. Keep the titles of commit messages to 50 characters or under. > +static void calculate_changed_submodule_paths( > + struct submodule_parallel_fetch

Re: [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it

2018-10-17 Thread Jonathan Tan
> We can string_list_insert() to maintain sorted-ness of the > list as we find new items, or we can string_list_append() to > build an unsorted list and sort it at the end just once. This confused me at first, because neither function is mentioned in the patch. > As we do not rely on the

Re: Git wire protocol v2 fails fetching shallow changes with `pack has XXX unresolved deltas` on large repos with lots of tags Inbox x

2018-10-17 Thread Jonathan Tan
> No changes are needed in mirrored repository. Crash happens both with > 2.18.0 and 2.19.1 git versions. Having repository locally is not > required but reduces test runtime, you can quite reliably reproduce > issue when fetching over net directly from chromium.orgbypassing > mirroring step. Do

Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Jonathan Tan
> | Implementation | Queries | Maybe | FP # | FP %  | > ||-|---|--|---| > | Szeder | 66095   | 1142  | 256  | 0.38% | > | Jonathan   | 66459   | 107   | 89   | 0.16% | > | Stolee | 53025   | 492   | 479  | 0.90% | > > (Note that we must have

Re: [PATCH 17/19] submodule: use submodule repos for object lookup

2018-10-16 Thread Jonathan Tan
> Thanks for the review of the whole series! > > I have redone this series, addressing all your comments. I addressed > this comment differently than suggested, and put the submodule > repository argument at the end of the parameter list, such that it > goes with all the other arguments to be

Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Tan
> 1. Teaching partial clone to attempt to fetch missing objects from > multiple remotes instead of only one. This is useful because you > can have a server that is nearby and cheaper to serve from (some > kind of local cache server) that you make requests to first before >

[PATCH] upload-pack: clear flags before each v2 request

2018-10-16 Thread Jonathan Tan
rt of each invocation. (One alternative is to reduce or eliminate usage of object flags in protocol v2, but that doesn't seem feasible because almost all 8 flags are used pervasively in v2 code.) Signed-off-by: Jonathan Tan --- This was noticed by Arturas Moskvinas in [1]. The reproduction steps

Re: Git wire protocol v2 fails fetching shallow changes with `pack has XXX unresolved deltas` on large repos with lots of tags Inbox x

2018-10-12 Thread Jonathan Tan
> On large repositories with lots of tags - git wire protocol v2 fails > to fetch shallow changes, it fails with error `pack has XXX unresolved > deltas`. Unfortunately I didn't find easy way to reproduce it except > cloning+fetching chromium repository, the way jenkins does. > Reproduction steps:

Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-11 Thread Jonathan Tan
> This series takes another approach as it doesn't change the signature of > functions, but introduces new functions that can deal with arbitrary > repositories, keeping the old function signature around using a shallow > wrapper. > > Additionally each patch adds a semantic patch, that would

Re: [PATCH 18/19] submodule: don't add submodule as odb for push

2018-10-11 Thread Jonathan Tan
> The submodule was added as an alternative in eb21c732d6 (push: teach > --recurse-submodules the on-demand option, 2012-03-29), but was > not explained, why. > > In similar code, submodule_has_commits, the submodule is added as an > alternative to perform a quick check if we need to dive into

Re: [PATCH 17/19] submodule: use submodule repos for object lookup

2018-10-11 Thread Jonathan Tan
> +/* > + * Initialize 'out' based on the provided submodule path. > + * > + * Unlike repo_submodule_init, this tolerates submodules not present > + * in .gitmodules. NEEDSWORK: The repo_submodule_init behavior is > + * preferrable. This function exists only to preserve historical behavior. What

Re: [PATCH 16/19] pretty: prepare format_commit_message to handle arbitrary repositories

2018-10-11 Thread Jonathan Tan
Patches 6-16 are all quite straightforward, and are reviewed-by: me.

Re: [PATCH 05/19] object: parse_object to honor its repository argument

2018-10-11 Thread Jonathan Tan
> In 8e4b0b6047 (object.c: allow parse_object to handle > arbitrary repositories, 2018-06-28), we forgot to pass the > repository down to the read_object_file. [snip] > @@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const > struct object_id *oid) > return

Re: [PATCH 04/19] object-store: prepare read_object_file to deal with arbitrary repositories

2018-10-11 Thread Jonathan Tan
> Introduce repo_read_object_file which takes the repository argument, and > hide the original read_object_file as a macro behind > NO_THE_REPOSITORY_COMPATIBILITY_MACROS, which we planned for in > e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21) That commit didn't seem to

Re: [PATCH 03/19] object-store: allow read_object_file_extended to read from arbitrary repositories

2018-10-11 Thread Jonathan Tan
> @@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct > object_id *oid, > const char *path; > struct stat st; > const struct object_id *repl = lookup_replace ? > - lookup_replace_object(the_repository, oid) : oid; > +

[PATCH] Per-commit and per-parent filters for 2 parents

2018-10-11 Thread Jonathan Tan
Signed-off-by: Jonathan Tan --- > One main benefit of storing on Bloom filter per commit is to avoid > recomputing hashes at every commit. Currently, this patch only improves > locality when checking membership at the cost of taking up more space. > Drop the dependence on th

[PATCH 2/2] Only make bloom filter for first parent

2018-10-10 Thread Jonathan Tan
Signed-off-by: Jonathan Tan --- commit-graph.c | 4 ++-- revision.c | 20 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 90b0b3df90..d21d555611 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -782,9 +782,9

[PATCH 1/2] One filter per commit

2018-10-10 Thread Jonathan Tan
Signed-off-by: Jonathan Tan --- bloom-filter.c | 31 ++- bloom-filter.h | 12 commit-graph.c | 26 -- revision.c | 9 +++-- 4 files changed, 37 insertions(+), 41 deletions(-) diff --git a/bloom-filter.c b/bloom-filter.c

[PATCH 0/2] Per-commit filter proof of concept

2018-10-10 Thread Jonathan Tan
. There are more possibilities like dynamic filter sizing, different hashing, and hashing to support wildcard matches, which I haven't looked into. Jonathan Tan (2): One filter per commit Only make bloom filter for first parent bloom-filter.c | 31 ++- bloom

Re: [PATCH] builtin/grep.c: remote superflous submodule code

2018-10-09 Thread Jonathan Tan
> It claimed that grep would still need some explicit handling, but that is > not the call to repo_read_gitmodules (applying this patch on top of > ff6f1f564c4 still keep the test suite happy, specifically > t7814-grep-recurse-submodules, which contains a test > "grep history with moved

[PATCH v2] cache-tree: skip some blob checks in partial clone

2018-10-09 Thread Jonathan Tan
the repository, it can be done using fsck (which will notify if a tree points to a missing and non-promised blob, whether the blob is included or excluded by the sparse-checkout specification). Signed-off-by: Jonathan Tan --- Changes from v1: After feedback, I restricted this to partial clone. Once

[PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs

2018-10-08 Thread Jonathan Tan
flag, and teach unpack_trees() to invoke cache_tree_update() with this new flag. Signed-off-by: Jonathan Tan --- cache-tree.c | 6 +- cache-tree.h | 4 t/t1090-sparse-checkout-scope.sh | 33 unpack-trees.c

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jonathan Tan
> Yes, I agree with you that the loops are still entwined. They're at > least now in a single function, though, which IMHO is a slight > improvement. Hmm...originally, the main functionality was in a single loop in a single function. But I say that because I consider the lazy loading in

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jonathan Tan
> Determine if the oidset needs to be populated upfront and then do that > instead. This duplicates a loop, but simplifies the existing one by > separating concerns between the two. [snip] > + if (strict) { > + for (i = 0; i < nr_sought; i++) { > + ref =

[PATCH v2 0/2] Lazy fetch bug fix (and feature that reveals it)

2018-10-03 Thread Jonathan Tan
eals the bug that patch 1 fixes, and to make it easier for others to verify that these patches together pass all tests when rebased on [master + md/filter-trees] or 'pu' (at least, as of the time of writing). [1] https://public-inbox.org/git/20180927183718.89804-1-jonathanta...@google.com/ Jonathan Tan

[PATCH v2 2/2] fetch-pack: exclude blobs when lazy-fetching trees

2018-10-03 Thread Jonathan Tan
tocol versions to benefit from this optimization. Signed-off-by: Jonathan Tan --- fetch-pack.c | 14 ++ fetch-pack.h | 7 +++ t/t0410-partial-clone.sh | 41 3 files changed, 62 insertions(+) diff --git a/fetch-pack.c b/fetch-pack.

[PATCH v2 1/2] fetch-pack: avoid object flags if no_dependents

2018-10-03 Thread Jonathan Tan
if the Git project plans to support those. Signed-off-by: Jonathan Tan --- fetch-pack.c | 101 ++- 1 file changed, 59 insertions(+), 42 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index 75047a4b2a..b9b1211dda 100644 --- a/fetch-pack.c +

Re: [RFC PATCH v2 4/4] fetch: do not list refs if fetching only hashes

2018-09-28 Thread Jonathan Tan
> > + /* > > +* We can avoid listing refs if all of them are exact > > +* OIDs > > +*/ > > + must_list_refs = 0; > > + for (i = 0; i < rs->nr; i++) { > > + if (!rs->items[i].exact_sha1)

[PATCH] negotiator/skipping: parse commit before queueing

2018-09-27 Thread Jonathan Tan
The skipping negotiator pushes entries onto the priority queue without ensuring that the commit is parsed, resulting in the entry ending up in the wrong position due to a lack of commit time. Fix this by parsing the commit whenever we push an entry onto the priority queue. Signed-off-by: Jonathan

Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-09-27 Thread Jonathan Tan
> I get: > > warning: Ignoring --negotiation-tip because the protocol does not support > it. When I implemented --negotiation-tip, I only implemented it for protocols that support connect or stateless-connect, because implementing it fully would have required expanding the protocol helper

Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-09-27 Thread Jonathan Tan
> > If you wanted to do this, it seems better to me to just declare a "null" > > negotiation algorithm that does not perform any negotiation at all. > > I think such an algorithm is a good idea in general, especially for > testing, and yeah, maybe that's the best way out of this, i.e. to do: > >

[RFC PATCH v2 4/4] fetch: do not list refs if fetching only hashes

2018-09-27 Thread Jonathan Tan
If only hash literals are given on a "git fetch" command-line, tag following is not requested, and the fetch is done using protocol v2, a list of refs is not required from the remote. Therefore, optimize by invoking transport_get_remote_refs() only if we need the refs. Signed-off-by: Jo

[RFC PATCH v2 3/4] transport: list refs before fetch if necessary

2018-09-27 Thread Jonathan Tan
. Instead, fix this by having transport_fetch_refs() call transport_get_remote_refs() to ensure that the latter is always called at least once, unless the transport explicitly states that it supports fetching without listing refs. Signed-off-by: Jonathan Tan --- transport-helper.c | 1

[RFC PATCH v2 2/4] transport: do not list refs if possible

2018-09-27 Thread Jonathan Tan
fetch " will be done in a subsequent patch. Signed-off-by: Jonathan Tan --- fetch-pack.c | 2 +- t/t5702-protocol-v2.sh | 5 + transport.c| 13 +++-- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index

[RFC PATCH v2 1/4] transport: allow skipping of ref listing

2018-09-27 Thread Jonathan Tan
The get_refs_via_connect() function both performs the handshake (including determining the protocol version) and obtaining the list of remote refs. However, the fetch protocol v2 supports fetching objects without the listing of refs, so make it possible for the user to skip the listing by creating

[RFC PATCH v2 0/4] Avoid ls-refs when possible in protocol v2

2018-09-27 Thread Jonathan Tan
hen fetch-refs is done"? > > I dunno. I'm not sure I understand - transports generally don't care if get-refs-list is called after fetch-refs. Also, this already happens when fetching with tag following from a server that does not support tag following, using a transport that su

Re: [PATCH] fetch-pack: approximate no_dependents with filter

2018-09-27 Thread Jonathan Tan
> It is very clear how you are churning the code, but it is utterly > unclear from the description what you perceived as a problem and why > this change is a good (if not the best) solution for that problem, > at least to me. Firstly, thanks for your comments and questions - it's sometimes hard

[RFC PATCH] transport: list refs before fetch if necessary

2018-09-25 Thread Jonathan Tan
without listing refs. Signed-off-by: Jonathan Tan --- I discovered this while investigating the possibility of taking advantage of the fact that protocol v2 allows us to fetch without first invoking ls-refs. This is useful both when lazily fetching to a partial clone, and when invoking "git

[PATCH] fetch-pack: approximate no_dependents with filter

2018-09-24 Thread Jonathan Tan
here closer to where the "filter" instruction is written to the wire so that only one part of the code needs to be changed in order for users of all protocol versions to benefit from this optimization. Signed-off-by: Jonathan Tan --- This was prompted by a user at $DAY_JOB who had a partial

[PATCH v2 0/2] Check presence of targets when fetching to partial clone

2018-09-21 Thread Jonathan Tan
in a few places, so such a change would cause some churn in the codebase. So I left this function alone. [1] https://public-inbox.org/git/xmqqy3bvycie@gitster-ct.c.googlers.com/ Jonathan Tan (2): connected: document connectivity in partial clones fetch: in partial clone, check presence

[PATCH v2 2/2] fetch: in partial clone, check presence of targets

2018-09-21 Thread Jonathan Tan
despite "filter"", 2018-07-09). Therefore, update quickfetch() to also directly check for the presence of all objects to be fetched. Its documentation and name are also updated to better reflect what it does. Signed-off-by: Jonathan Tan --- builtin/fetch.c | 15 +--

[PATCH v2 1/2] connected: document connectivity in partial clones

2018-09-21 Thread Jonathan Tan
reflected in the documentation of that function. Update the documentation accordingly. Signed-off-by: Jonathan Tan --- connected.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/connected.h b/connected.h index e4c961817..8d5a6b3ad 100644 --- a/connected.h +++ b/connected.h @@ -

Re: [PATCH] fetch: in partial clone, check presence of targets

2018-09-20 Thread Jonathan Tan
> Jonathan Tan writes: > > > + if (repository_format_partial_clone) { > > + /* > > +* For the purposes of the connectivity check, > > +* check_connected() considers all objects promised by > > +* promi

[PATCH] fetch: in partial clone, check presence of targets

2018-09-20 Thread Jonathan Tan
despite "filter"", 2018-07-09). Therefore, update quickfetch() to also directly check for the presence of all objects to be fetched. Signed-off-by: Jonathan Tan --- In this patch, I have striven to avoid piping from Git commands that may fail, following the guidelines in [1]. [1]

Re: [PATCH 1/2] fetch-object: provide only one fetching function

2018-09-13 Thread Jonathan Tan
> Instead of explaining why the new convention is better to justify > (2), the above three lines handwave by saying "more flexible" > twice. We should do better. > > fetch-object: unify fetch_object[s] functions > > There are fetch_object() and fetch_objects() helpers in >

[PATCH 0/2] Bugfix for partial clones and ref-in-want servers

2018-09-12 Thread Jonathan Tan
need only be applied once, and the second patch contains the actual bugfix. Jonathan Tan (2): fetch-object: provide only one fetching function fetch-object: set exact_oid when fetching fetch-object.c | 17 ++--- fetch-object.h | 8 ++-- s

[PATCH 2/2] fetch-object: set exact_oid when fetching

2018-09-12 Thread Jonathan Tan
setting of exact_oid, the wrong line will be sent. Set exact_oid, so that the correct line is sent. Signed-off-by: Jonathan Tan --- fetch-object.c | 1 + t/t0410-partial-clone.sh | 12 2 files changed, 13 insertions(+) diff --git a/fetch-object.c b/fetch-object.c index 1af1bf8

[PATCH 1/2] fetch-object: provide only one fetching function

2018-09-12 Thread Jonathan Tan
fetch-object.h currently provides two functions (fetch_object() and fetch_objects()) that could be replaced by a single, more flexible function. Replace those two functions with the more flexible function. Signed-off-by: Jonathan Tan --- fetch-object.c | 16 +--- fetch-object.h | 8

Re: Is it possible to git clone --filter= without any objects?

2018-09-11 Thread Jonathan Tan
On Tue, Sep 11, 2018 at 10:15 AM, Stefan Beller wrote: > You might be pleased to hear about a series floating on the mailing list, > that started at > https://public-inbox.org/git/cover.1533854545.git.matv...@google.com/ > and promised to filter trees away, and its latest version can be found at

Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> > So we don't want to die in list-objects.c. If we > > fail to fetch, then we will die on line 213 in rev-list.c. > > Why don't we want to die in list-objects.c? When --missing=error is > passed, fetch_if_missing retains its default value of 1, so > parse_tree_gently() will attempt to fetch it

Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> > > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const > > > char *prefix) > > > init_revisions(, prefix); > > > revs.abbrev = DEFAULT_ABBREV; > > > revs.commit_format = CMIT_FMT_UNSPECIFIED; > > > + revs.do_not_die_on_missing_tree = 1; > > > > Is this

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan
> - grep -E "tree|blob" objs >trees_and_blobs && > - test_line_count = 1 trees_and_blobs > + grep -E "tree|blob" objs \ > + | awk -f print_1.awk >trees_and_blobs && > + git -C r1 rev-parse HEAD: >expected && > + test_cmp trees_and_blobs expected Indent "| awk" (and similar lines in this patch) -

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan
> @@ -743,6 +743,9 @@ specification contained in . > A debug option to help with future "partial clone" development. > This option specifies how missing objects are handled. > + > +The form '--filter=tree:' omits all blobs and trees deeper than > + from the root tree. Currently, only

Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> Previously, we assumed only blob objects could be missing. This patch > makes rev-list handle missing trees like missing blobs. A missing tree > will cause an error if --missing indicates an error should be caused, > and the hash is printed even if the tree is missing. The last sentence is

Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> I don't understand about the ">= 0". What should I replace it with? > Maybe you mean the return is never positive so I can change: > > parse_tree_gently(tree, 1) >= 0 > > to: > !parse_tree_gently(tree, 1) > > ? Sorry for the lack of clarity - that is what I meant. > > The missing mechanism

  1   2   3   4   5   6   7   8   9   10   >