Re: test bare repository for unit tests
Hi. git-fast-export would work too, but it creates an additional step. I don't commit to the model repo during tests, but I do commit when I want to modify the tests. So far, I configured core.compression=0 and gc.auto=0, created the .gitattributes inside the bare repo dir containing one line: * binary I also created two empty .gitignore files inside refs/ and objects/ I still haven't found a way to force prune without pack after each push. On 22.02.2018 1:53, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 21 2018, Basin Ilya jotted: > >> Hi. >> I want to the test-repo-git under >> https://github.com/apache/maven-wagon/tree/master/wagon-providers/wagon-scm/src/test/resources/ >> just like test-repo-cvs and test-repo-svn >> >> Which configuation options would suit that? >> I think core.compression 0 for human readable diffs. >> also, I need to force loose, gc after each push. > > It looks like you have unit tests that are going to do integration tests > of some SVN/CVS repos as used by some other tool, and want to add git to > that. > > Since you have git already, the most straightforward thing to do would > be to ship the output of git-fast-export in the repo, and have the test > setup code create a repo locally out of that, then test it. > > Or do you really need to commit the raw repo files as-is for some reason > I've missed? >
Re: bug in HTTP protocol spec
> On Feb 21, 2018, at 9:37 PM, Jonathan Niederwrote: > > Thanks for writing it. > > Do you mind if we forge your sign-off? (See Documentation/SubmittingPatches > item '(5) Certify your work' for details about what this means.) Sure, or I can just re-paste: Signed-off-by: Dorian Taylor --- Documentation/technical/http-protocol.txt | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt index a0e45f2889e6e..19d73f7efb338 100644 --- a/Documentation/technical/http-protocol.txt +++ b/Documentation/technical/http-protocol.txt @@ -214,14 +214,17 @@ smart server reply: S: Cache-Control: no-cache S: S: 001e# service=git-upload-pack\n + S: S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n + S: The client may send Extra Parameters (see Documentation/technical/pack-protocol.txt) as a colon-separated string -in the Git-Protocol HTTP header. +in the Git-Protocol HTTP header. Note as well that there is *no* newline +after the ``. Dumb Server Response @@ -264,8 +267,8 @@ Servers MUST set $servicename to be the request parameter value. Servers SHOULD include an LF at the end of this line. Clients MUST ignore an LF at the end of the line. -Servers MUST terminate the response with the magic `` end -pkt-line marker. +Servers MUST follow the first pkt-line, as well as terminate the +response, with the magic `` end pkt-line marker. The returned response is a pkt-line stream describing each ref and its known value. The stream SHOULD be sorted by name according to @@ -278,6 +281,7 @@ Extra Parameter. smart_reply = PKT-LINE("# service=$servicename" LF) *1("version 1") +"" ref_list "" ref_list= empty_list / non_empty_list --- > >> Note I am not sure what the story is behind that `version 1` >> element, whether it's supposed to go before or after the null packet >> or if there should be another null packet or what. Perhaps somebody >> more fluent with the smart protocol can advise. > > I believe the 'version 1' goes after the flush-packet. I took a traipse through the code and couldn’t determine it one way or another, but my money is on that looking something like `000aversion 1\n` on the wire. -- Dorian Taylor Make things. Make sense. https://doriantaylor.com signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH 26/27] sha1_file: allow map_sha1_file to handle arbitrary repositories
Stefan Beller wrote: > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder > --- > object-store.h | 3 +-- > sha1_file.c| 5 +++-- > 2 files changed, 4 insertions(+), 4 deletions(-) Thanks: this is short and simple, making my life as a reviewer very easy. Reviewed-by: Jonathan Nieder
Re: [PATCH 21/27] sha1_file: add repository argument to sha1_loose_object_info
Hi, Stefan Beller wrote: > Add a repository argument to allow the sha1_loose_object_info caller > to be more specific about which repository to act on. This is a small > mechanical change; it doesn't change the implementation to handle > repositories other than the_repository yet. > > As with the previous commits, use a macro to catch callers passing a > repository other than the_repository at compile time. > > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder > --- > sha1_file.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) Simple and obviously correct. Reviewed-by: Jonathan Nieder
Re: [PATCH 17/27] sha1_file: add repository argument to stat_sha1_file
Hi, Stefan Beller wrote: > Add a repository argument to allow the stat_sha1_file caller to be > more specific about which repository to act on. This is a small > mechanical change; it doesn't change the implementation to handle > repositories other than the_repository yet. > > As with the previous commits, use a macro to catch callers passing a > repository other than the_repository at compile time. > > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder > --- > sha1_file.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) Simple and obviously correct. Reviewed-by: Jonathan Nieder
Re: [PATCH 13/27] sha1_file: add repository argument to prepare_alt_odb
Hi, Stefan Beller wrote: > See previous patch for explanation. > > While at it, move the declaration to object-store.h, > where it should be easier to find. Which declaration? It looks like prepare_alt_odb is already in object-store.h. [...] > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -717,7 +717,7 @@ int cmd_fsck(int argc, const char **argv, const char > *prefix) > } else { > fsck_object_dir(get_object_directory()); > > - prepare_alt_odb(); > + prepare_alt_odb(the_repository); Patch 2 added a #include of "repository.h". Good. (I checked because with the definition of prepare_alt_odb as a macro, the function call would compile correctly even if the_repository weren't in scope, but we want to include what we use as a matter of style/maintainability.) [...] > --- a/packfile.c > +++ b/packfile.c > @@ -890,7 +890,7 @@ void prepare_packed_git(void) > if (the_repository->objects.packed_git_initialized) > return; > prepare_packed_git_one(get_object_directory(), 1); > - prepare_alt_odb(); > + prepare_alt_odb(the_repository); Also good, since patch 3 added a #include of "repository.h". [...] > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -23,6 +23,7 @@ > #include "sha1-lookup.h" > #include "bulk-checkin.h" > #include "repository.h" > +#include "object-store.h" Should this #include have been added in an earlier patch, since the file both uses and defines prepare_alt_odb, which is declared there? The rest looks good. Thanks, Jonathan
[PATCH] t: send verbose test-helper output to fd 4
This is a repost of the two patches from: https://public-inbox.org/git/20180209185710.ga23...@sigill.intra.peff.net/ (now just one patch, since sg/test-i18ngrep graduated and we can do it all in one step). The idea got positive feedback, but nobody commented on patches and I didn't see them in "What's cooking". -- >8 -- Test helper functions like test_must_fail may produce messages to stderr when they see a problem. When the tests are run with "--verbose", this ends up on the test script's stderr, and the user can read it. But there's a problem. Some tests record stderr as part of the test, like: test_must_fail git foo 2>output && test_i18ngrep expected.message output In this case the error text goes into "output". This makes the --verbose output less useful (it also means we might accidentally match it in the second, though in practice we tend to produce these messages only on error, so we'd abort the test when the first command fails). Let's instead send this user-facing output directly to descriptor 4, which always points to the original stderr (or /dev/null in non-verbose mode). And it's already forbidden to redirect descriptor 4, since we use it for BASH_XTRACEFD, as explained in 9be795fbce (t5615: avoid re-using descriptor 4, 2017-12-08). Signed-off-by: Jeff King--- t/test-lib-functions.sh | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 67b5994afb..aabee13e5d 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -625,22 +625,22 @@ test_must_fail () { exit_code=$? if test $exit_code -eq 0 && ! list_contains "$_test_ok" success then - echo >&2 "test_must_fail: command succeeded: $*" + echo >&4 "test_must_fail: command succeeded: $*" return 1 elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe then return 0 elif test $exit_code -gt 129 && test $exit_code -le 192 then - echo >&2 "test_must_fail: died by signal $(($exit_code - 128)): $*" + echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*" return 1 elif test $exit_code -eq 127 then - echo >&2 "test_must_fail: command not found: $*" + echo >&4 "test_must_fail: command not found: $*" return 1 elif test $exit_code -eq 126 then - echo >&2 "test_must_fail: valgrind error: $*" + echo >&4 "test_must_fail: valgrind error: $*" return 1 fi return 0 @@ -678,7 +678,7 @@ test_expect_code () { return 0 fi - echo >&2 "test_expect_code: command exited with $exit_code, we wanted $want_code $*" + echo >&4 "test_expect_code: command exited with $exit_code, we wanted $want_code $*" return 1 } @@ -742,18 +742,18 @@ test_i18ngrep () { shift ! grep "$@" && return 0 - echo >&2 "error: '! grep $@' did find a match in:" + echo >&4 "error: '! grep $@' did find a match in:" else grep "$@" && return 0 - echo >&2 "error: 'grep $@' didn't find a match in:" + echo >&4 "error: 'grep $@' didn't find a match in:" fi if test -s "$last_arg" then - cat >&2 "$last_arg" + cat >&4 "$last_arg" else - echo >&2 "" + echo >&4 "" fi return 1 @@ -764,7 +764,7 @@ test_i18ngrep () { # not output anything when they fail. verbose () { "$@" && return 0 - echo >&2 "command failed: $(git rev-parse --sq-quote "$@")" + echo >&4 "command failed: $(git rev-parse --sq-quote "$@")" return 1 } -- 2.16.2.580.g650ee5408b
Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store
Hi, Stefan Beller wrote: > In a process with multiple repositories open, packfile accessors > should be associated to a single repository and not shared globally. > Move packed_git and packed_git_mru into the_repository and adjust > callers to reflect this. > > Patch generated by > > 1. Moving the struct packed_git declaration to object-store.h > and packed_git, packed_git_mru globals to struct object_store. > > 2. Applying the semantic patch > contrib/coccinelle/refactoring/packed_git.cocci to adjust callers. > This semantic patch is placed in a sub directory of the coccinelle > contrib dir, as this semantic patch is not expected to be of general > usefulness; it is only useful during developing this series and > merging it with other topics in flight. At a later date, just > delete that semantic patch. Can the semantic patch go in the commit message instead? It is very brief. Actually, I don't see this semantic patch in the diffstat. Is the commit message stale? > 3. Applying line wrapping fixes from "make style" to break the > resulting long lines. > > 4. Adding missing #includes of repository.h and object-store.h > where needed. Is there a way to automate this step? (I'm asking for my own reference when writing future patches, not because of any concern about the correctness of this one.) > > 5. As the packfiles are now owned by an objectstore/repository, which > is ephemeral unlike globals, we introduce memory leaks. So address > them in raw_object_store_clear(). The compound words are confusing me. What is an objectstore/repository? Are these referring to particular identifiers or something else? Would some wording like the following work? 5. Freeing packed_git and packed_git_mru in raw_object_store_clear to avoid a per-repository memory leak. Previously they were global singletons, so code to free them did not exist. [...] > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -12,6 +12,7 @@ > #include "exec_cmd.h" > #include "streaming.h" > #include "thread-utils.h" > +#include "object-store.h" > #include "packfile.h" > > static const char index_pack_usage[] = Change from a different patch leaked into this one? [...] > +++ b/builtin/pack-objects.c [...] > @@ -1044,7 +1045,7 @@ static int want_object_in_pack(const struct object_id > *oid, > } > want = want_found_object(exclude, p); > if (!exclude && want > 0) > - list_move(>mru, _git_mru); > + list_move(>mru, > _repository->objects.packed_git_mru); Long line. Can "make style" catch this? [...] > +++ b/builtin/receive-pack.c > @@ -7,6 +7,7 @@ > #include "sideband.h" > #include "run-command.h" > #include "exec_cmd.h" > +#include "object-store.h" > #include "commit.h" > #include "object.h" > #include "remote.h" Another change leaked in? [...] > --- a/cache.h > +++ b/cache.h > @@ -1585,35 +1585,6 @@ struct pack_window { > unsigned int inuse_cnt; > }; > > -extern struct packed_git { [...] > -} *packed_git; Move detecting diff confirms that this wasn't modified. Thanks for creating it. [...] > +++ b/fast-import.c [...] > @@ -1110,7 +1112,7 @@ static int store_object( > if (e->idx.offset) { > duplicate_count_by_type[type]++; > return 1; > - } else if (find_sha1_pack(oid.hash, packed_git)) { > + } else if (find_sha1_pack(oid.hash, > the_repository->objects.packed_git)) { Long line. (I'll refrain from commenting about any further ones.) [...] > +++ b/http-push.c > @@ -1,4 +1,5 @@ > #include "cache.h" > +#include "object-store.h" > #include "commit.h" > #include "tag.h" > #include "blob.h" Stray change? > diff --git a/http-walker.c b/http-walker.c > index 07c2b1af82..8bb5d991bb 100644 > --- a/http-walker.c > +++ b/http-walker.c > @@ -4,6 +4,7 @@ > #include "http.h" > #include "list.h" > #include "transport.h" > +#include "object-store.h" > #include "packfile.h" > > struct alt_base { Same question. > diff --git a/http.c b/http.c > index 31755023a4..a4a9e583c7 100644 > --- a/http.c > +++ b/http.c > @@ -1,6 +1,7 @@ > #include "git-compat-util.h" > #include "http.h" > #include "config.h" > +#include "object-store.h" > #include "pack.h" > #include "sideband.h" > #include "run-command.h" Likewise. > diff --git a/object-store.h b/object-store.h > index e78eea1dde..1de9e07102 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -52,6 +52,30 @@ void add_to_alternates_memory(const char *dir); > */ > struct strbuf *alt_scratch_buf(struct alternate_object_database *alt); > > +struct packed_git { > + struct packed_git *next; > + struct list_head mru; > + struct pack_window *windows; > + off_t pack_size; > + const void *index_data; > + size_t index_size; > + uint32_t num_objects; > + uint32_t num_bad_objects; > +
Re: [PATCH 01/27] repository: introduce raw object store field
Hi, Stefan Beller wrote: > The raw object store field will contain any objects needed for > access to objects in a given repository. > > This patch introduces the raw object store and populates it with the > `objectdir`, which used to be part of the repository struct. > > As the struct gains members, we'll also populate the function to clear > the memory for these members. > > In a later we'll introduce a struct object_parser, that will complement > the object parsing in a repository struct: The raw object parser is the > layer that will provide access to raw object content, while the higher > level object parser code will parse raw objects and keeps track of > parenthood and other object relationships using 'struct object'. > For now only add the lower level to the repository struct. > > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder > --- Heh, I suppose that sign-off makes me not a great candidate for reviewing this. It's been long enough since I looked at the patch that I feel okay trying anyway. [...] > --- /dev/null > +++ b/object-store.h > @@ -0,0 +1,15 @@ > +#ifndef OBJECT_STORE_H > +#define OBJECT_STORE_H > + > +struct raw_object_store { > + /* > + * Path to the repository's object store. > + * Cannot be NULL after initialization. > + */ > + char *objectdir; > +}; > +#define RAW_OBJECT_STORE_INIT { NULL } Who owns and is responsible for freeing objectdir? > + > +void raw_object_store_clear(struct raw_object_store *o); Oh, that answers that. It could be worth a note in the doc comment anyway, but I don't mind not having one. [...] > + > +void raw_object_store_clear(struct raw_object_store *o) > +{ > + free(o->objectdir); > +} Should this use FREE_AND_NULL? [...] > --- a/repository.c > +++ b/repository.c [...] > @@ -42,8 +49,8 @@ static void repo_setup_env(struct repository *repo) > !repo->ignore_env); > free(repo->commondir); > repo->commondir = strbuf_detach(, NULL); > - free(repo->objectdir); > - repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir, > + free(repo->objects.objectdir); Should this call raw_object_store_clear instead of calling free directly? > + repo->objects.objectdir = git_path_from_env(DB_ENVIRONMENT, > repo->commondir, > "objects", !repo->ignore_env); Long line. One way to break it would be repo->objects.objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir, "objects", !repo->ignore_env); > free(repo->graft_file); > repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir, > @@ -209,12 +216,14 @@ void repo_clear(struct repository *repo) > { > FREE_AND_NULL(repo->gitdir); > FREE_AND_NULL(repo->commondir); > - FREE_AND_NULL(repo->objectdir); > FREE_AND_NULL(repo->graft_file); > FREE_AND_NULL(repo->index_file); > FREE_AND_NULL(repo->worktree); > FREE_AND_NULL(repo->submodule_prefix); > > + raw_object_store_clear(>objects); > + memset(>objects, 0, sizeof(repo->objects)); If raw_object_store_clear uses FREE_AND_NULL, this memset won't be necessary. [...] > --- a/repository.h > +++ b/repository.h > @@ -1,6 +1,8 @@ > #ifndef REPOSITORY_H > #define REPOSITORY_H > > +#include "object-store.h" > + > struct config_set; > struct index_state; > struct submodule_cache; > @@ -21,10 +23,9 @@ struct repository { > char *commondir; > > /* > - * Path to the repository's object store. > - * Cannot be NULL after initialization. > + * Holds any information related to the object store. >*/ This comment doesn't make it clear to me what the field represents. Can it be replaced with some of the description from the commit message? > - char *objectdir; > + struct raw_object_store objects; > Thanks, Jonathan
Re: bug in HTTP protocol spec
(+cc: bmwill@, who is working on protocol v2) Hi, Dorian Taylor wrote: > On Feb 21, 2018, at 2:15 PM, Jeff Kingwrote: >> Thanks, I agree the document is buggy. Do you want to submit a patch? > > Will this do? Thanks for writing it. Do you mind if we forge your sign-off? (See Documentation/SubmittingPatches item '(5) Certify your work' for details about what this means.) > Note I am not sure what the story is behind that `version 1` > element, whether it's supposed to go before or after the null packet > or if there should be another null packet or what. Perhaps somebody > more fluent with the smart protocol can advise. I believe the 'version 1' goes after the flush-packet. Thanks, Jonathan
Re: [PATCH 2/2] ref-filter: get rid of goto
2018-02-21 20:41 GMT+03:00 Junio C Hamano: > Olga Telezhnaya writes: > >> Get rid of goto command in ref-filter for better readability. >> >> Signed-off-by: Olga Telezhnaia >> Mentored-by: Christian Couder >> Mentored by: Jeff King >> --- >> ref-filter.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) > > It looks like this is the same change as the bottom-most change on > your "cat-file --batch" series (and is obviously correct). > > I am puzzled by your intention---are you re-organizing and rebooting > the series? Either 'Yes' or 'No' is an acceptable answer, and so is > anything else. I just want to know what you want to happen to the > merge conflicts if I queued this while still keeping your "cat-file > --batch" thing I have on 'pu'). Thanks for the question, I needed to mention that. We (with Peff) decided that it's better and easier to remake whole previous patch. Before that, I want to make some refactorings in ref-filter so after that migrating should go much easier. I want to do that by small separate patches, so this is first in the series. I hope it would be both easier to review for you and easier to fix for me. So, if everything is fine, you could merge it to master. I will rewrite most parts of previous patch anyway. Thanks! > >> >> diff --git a/ref-filter.c b/ref-filter.c >> index 83ffd84affe52..28df6e21fb996 100644 >> --- a/ref-filter.c >> +++ b/ref-filter.c >> @@ -1494,11 +1494,11 @@ static void populate_value(struct ref_array_item >> *ref) >> for (i = 0; i < used_atom_cnt; i++) { >> struct atom_value *v = >value[i]; >> if (v->s == NULL) >> - goto need_obj; >> + break; >> } >> - return; >> + if (used_atom_cnt <= i) >> + return; >> >> - need_obj: >> get_object(ref, >objectname, 0, ); >> >> /* >> >> -- >> https://github.com/git/git/pull/460
Re: Bug Report: git status triggers a file change event
+git-for-windows Hi, Raining Chain wrote: > On Windows 10, git version 2.16.2.windows.1, running the command > > git status > > will trigger a file change event to file C:\myPath\.git "Attributes changed." > > This causes problems when using scripts that detect file changes such > as tsc -w (Typescript compiler) and using softwares that regularly > call git status such as VisualStudioCode. > > Thanks. Can you say more about how "tsc -w" reacts to the file change? Is there a way to tell it to exclude particular files from the files it watches? Thanks, Jonathan
Bug Report: git status triggers a file change event
On Windows 10, git version 2.16.2.windows.1, running the command git status will trigger a file change event to file C:\myPath\.git "Attributes changed." This causes problems when using scripts that detect file changes such as tsc -w (Typescript compiler) and using softwares that regularly call git status such as VisualStudioCode. Thanks.
Re: Git should preserve modification times at least on request
On Wed, Feb 21, 2018 at 06:58:34PM -0500, Randall S. Becker wrote: > May I suggest storing the date/time in UTC+0 in all cases. I can see > potential issues a couple of times a year where holes exist. I cannot even > fathom what would happen on a merge or edit of history. I consider storing the timestamp simply in the traditional seconds-since-epoch UNIX timestamp format. But I'm not entirely sure yet (see below). If a timestamp includes the offset, there shouldn't be any issue with holes. UTC+0 is nice, too, of course, though some might want to preserve the timezone in which the timestamp was actually created. The bigger issue is usually to copy with those pesky leap seconds. It makes a difference whether one uses solar seconds ("posix" style; those are more commonly seen) or atomic seconds ("right" style) for the UNIX timestamp. Those differences accumulate over time, so you can have almost half a minute delta if you are not careful with timestamp conversion. If I remember correctly, rcs uses some rather awkward interative convergence algorithm to portably convert from human-readable date and time to UNIX timestamps. Thus I'm still not sure whether it will be a UNIX-format timestamp or whether a human-readable date/time might be preferrable. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On 2/21/2018 6:13 PM, Jeff King wrote: On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. I'm not sure after my earlier digging if there is even a way to trigger this (and if so, it is probably accidental, since those lines were added explicitly for --show-all). And actually after re-reading the commit message for 3131b7130 again, I think the current behavior is definitely not something that was carefully planned. So I'd propose a commit message like below. I only submitted my patch to avoid making you do the work of writing the commit message. My messages still don't have quite the right amount of detail (or the correct details, in this case). Junio: please add Reported-by: Derrick StoleeThanks, -Stolee -- >8 -- Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() The "--show-all" revision option shows UNINTERESTING commits. Some of these commits may be unparsed when we try to show them (since we may or may not need to walk their parents to fulfill the request). Commit 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09) resolved this by just skipping pretty-printing for commits without their object contents cached, saying: Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That was the easy fix to avoid the pretty-printer segfaulting, but: 1. It doesn't work for all formats. E.g., --oneline prints the oid for each such commit but not a trailing newline, leading to jumbled output. 2. It only affects some commits, depending on whether we happened to parse them or not (so if they were at the tip of an UNINTERESTING starting point, or if we happened to traverse over them, you'd see more data). 3. It unncessarily ties the decision to show the verbose header to whether the commit buffer was cached. That makes it harder to change the logic around caching (e.g., if we could traverse without actually loading the full commit objects). These days it's safe to feed such a commit to the pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), we'll load it on demand in such a case. So let's just always show the verbose headers. This does change the behavior of plumbing, but: a. The --show-all option was explicitly introduced as a debugging aid, and was never documented (and has rarely even been mentioned on the list by git devs). b. Avoiding the commits was already not deterministic due to (2) above. So the caller might have seen full headers for these commits anyway, and would need to be prepared for it. Signed-off-by: Jeff King --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9e11..d320b6f1e3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d6d1..22b2fb6c58 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT;
Re: What's cooking in git.git (Feb 2018, #03; Wed, 21)
On 02/21, Junio C Hamano wrote: > * bw/c-plus-plus (2018-02-14) 38 commits > - fixup! diff: rename 'this' variables > - replace: rename 'new' variables > - trailer: rename 'template' variables > - tempfile: rename 'template' variables > - wrapper: rename 'template' variables > - environment: rename 'namespace' variables > - diff: rename 'template' variables > - environment: rename 'template' variables > - init-db: rename 'template' variables > - unpack-trees: rename 'new' variables > - trailer: rename 'new' variables > - submodule: rename 'new' variables > - split-index: rename 'new' variables > - remote: rename 'new' variables > - ref-filter: rename 'new' variables > - read-cache: rename 'new' variables > - line-log: rename 'new' variables > - imap-send: rename 'new' variables > - http: rename 'new' variables > - entry: rename 'new' variables > - diffcore-delta: rename 'new' variables > - diff: rename 'new' variables > - diff-lib: rename 'new' variable > - commit: rename 'new' variables > - combine-diff: rename 'new' variables > - remote: rename 'new' variables > - reflog: rename 'new' variables > - pack-redundant: rename 'new' variables > - help: rename 'new' variables > - checkout: rename 'new' variables > - apply: rename 'new' variables > - apply: rename 'try' variables > - diff: rename 'this' variables > - rev-parse: rename 'this' variable > - pack-objects: rename 'this' variables > - blame: rename 'this' variables > - object: rename function 'typename' to 'type_name' > - object_info: change member name from 'typename' to 'type_name' > > Avoid using identifiers that clash with C++ keywords. Even though > it is not a goal to compile Git with C++ compilers, changes like > this help use of code analysis tools that targets C++ on our > codebase. > > Is the 'fixup!' cleanly squashable to the problematic one, or does > this series require another reroll to get it in a good enough shape? Yeah the fixup patch looks good to me. I don't think there was anything else that needed attention so it should be in good shape with the fixup patch. -- Brandon Williams
Re: [PATCH v3 0/3] Add "git rebase --show-current-patch"
Junio C Hamanowrote: >> Compared to v2: >> - the potential loss of errno before it's printed out in builtin/am.c >> is fixed. >> - keep update_ref() in sequencer.c non-fatal like this rest >> - rename ORIG_COMMIT to REBASE_HEAD >> Interdiff: > This round hasn't seen any comments. Is everybody happy with it? > I personally do not have strong opinion for the feature but didn't > spot anything against the execution, either, so... Sorry for the late reply: I dislike REBASE_/HEAD/ because ORIG_/HEAD/ refers to the tip of the original branch, and /ORIG/_HEAD refers to the original branch, so /REBASE/_/HEAD/ is doubly confusing IMHO. I consider ORIG_COMMIT easier to understand because ORIG_HEAD refers to the tip of the original branch, and ORIG_COMMIT would refer to one of the commits making up that original branch, but as I suggested it myself I might not be very objective in that regard :-). (BTW, thanks, Duy, for doing the actual work!) Tim
Re: [PATCH 16/27] sha1_file: add repository argument to sha1_file_name
On 02/20, Stefan Beller wrote: > Add a repository argument to allow sha1_file_name callers to be more > specific about which repository to handle. This is a small mechanical > change; it doesn't change the implementation to handle repositories > other than the_repository yet. > > As with the previous commits, use a macro to catch callers passing a > repository other than the_repository at compile time. > > While at it, move the declaration to object-store.h, where it should > be easier to find. Seems like we may want to make a sha1-file.h or an oid-file.h or something like that at some point as that seems like a better place for the function than in the object-store.h file? -- Brandon Williams
Re: [PATCH 14/27] sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
On 02/20, Stefan Beller wrote: > Actually this also allows read_info_alternates and link_alt_odb_entry to > handle arbitrary repositories, but link_alt_odb_entries is the most > interesting function in this set of functions, hence the commit subject. > > These functions span a strongly connected component in the function > graph, i.e. the recursive call chain might look like > > -> link_alt_odb_entries > -> link_alt_odb_entry > -> read_info_alternates > -> link_alt_odb_entries > > That is why we need to convert all these functions at the same time. > > Signed-off-by: Jonathan Nieder> Signed-off-by: Stefan Beller > --- Looks good. > object-store.h | 4 > sha1_file.c| 36 > 2 files changed, 20 insertions(+), 20 deletions(-) > > diff --git a/object-store.h b/object-store.h > index 514ad94287..f283fbdba9 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -18,6 +18,10 @@ struct alternate_object_database { > char loose_objects_subdir_seen[256]; > struct oid_array loose_objects_cache; > > + /* > + * Path to the alternative object store. If this is a relative path, > + * it is relative to the current working directory. > + */ Thanks for adding the comment. > char path[FLEX_ARRAY]; > }; > #define prepare_alt_odb(r) prepare_alt_odb_##r() > diff --git a/sha1_file.c b/sha1_file.c > index c26921df4a..6e5105a252 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -390,11 +390,10 @@ static int alt_odb_usable(struct raw_object_store *o, > * SHA1, an extra slash for the first level indirection, and the > * terminating NUL. > */ > -#define read_info_alternates(r, rb, d) read_info_alternates_##r(rb, d) > -static void read_info_alternates_the_repository(const char *relative_base, > - int depth); > -#define link_alt_odb_entry(r, e, rb, d, n) link_alt_odb_entry_##r(e, rb, d, > n) > -static int link_alt_odb_entry_the_repository(const char *entry, > +static void read_info_alternates(struct repository *r, > + const char *relative_base, > + int depth); > +static int link_alt_odb_entry(struct repository *r, const char *entry, > const char *relative_base, int depth, const char *normalized_objdir) > { > struct alternate_object_database *ent; > @@ -420,7 +419,7 @@ static int link_alt_odb_entry_the_repository(const char > *entry, > while (pathbuf.len && pathbuf.buf[pathbuf.len - 1] == '/') > strbuf_setlen(, pathbuf.len - 1); > > - if (!alt_odb_usable(_repository->objects, , > normalized_objdir)) { > + if (!alt_odb_usable(>objects, , normalized_objdir)) { > strbuf_release(); > return -1; > } > @@ -428,12 +427,12 @@ static int link_alt_odb_entry_the_repository(const char > *entry, > ent = alloc_alt_odb(pathbuf.buf); > > /* add the alternate entry */ > - *the_repository->objects.alt_odb_tail = ent; > - the_repository->objects.alt_odb_tail = &(ent->next); > + *r->objects.alt_odb_tail = ent; > + r->objects.alt_odb_tail = &(ent->next); > ent->next = NULL; > > /* recursively add alternates */ > - read_info_alternates(the_repository, pathbuf.buf, depth + 1); > + read_info_alternates(r, pathbuf.buf, depth + 1); > > strbuf_release(); > return 0; > @@ -468,12 +467,8 @@ static const char *parse_alt_odb_entry(const char > *string, > return end; > } > > -#define link_alt_odb_entries(r, a, s, rb, d) \ > - link_alt_odb_entries_##r(a, s, rb, d) > -static void link_alt_odb_entries_the_repository(const char *alt, > - int sep, > - const char *relative_base, > - int depth) > +static void link_alt_odb_entries(struct repository *r, const char *alt, > + int sep, const char *relative_base, int depth) > { > struct strbuf objdirbuf = STRBUF_INIT; > struct strbuf entry = STRBUF_INIT; > @@ -487,7 +482,7 @@ static void link_alt_odb_entries_the_repository(const > char *alt, > return; > } > > - strbuf_add_absolute_path(, get_object_directory()); > + strbuf_add_absolute_path(, r->objects.objectdir); > if (strbuf_normalize_path() < 0) > die("unable to normalize object directory: %s", > objdirbuf.buf); > @@ -496,15 +491,16 @@ static void link_alt_odb_entries_the_repository(const > char *alt, > alt = parse_alt_odb_entry(alt, sep, ); > if (!entry.len) > continue; > - link_alt_odb_entry(the_repository, entry.buf, > + link_alt_odb_entry(r, entry.buf, > relative_base, depth, objdirbuf.buf); > } >
Re: [PATCH 08/27] pack: move approximate object count to object store
On 02/20, Stefan Beller wrote: > The approximate_object_count() function maintains a rough count of > objects in a repository to estimate how long object name abbreviates > should be. Object names are scoped to a repository and the > appropriate length may differ by repository, so the object count > should not be global. > > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder > --- > object-store.h | 10 +- > packfile.c | 11 +-- > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/object-store.h b/object-store.h > index 6cecba3951..bd1e4fcd8b 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -93,6 +93,14 @@ struct raw_object_store { > struct alternate_object_database *alt_odb_list; > struct alternate_object_database **alt_odb_tail; > > + /* > + * A fast, rough count of the number of objects in the repository. > + * These two fields are not meant for direct access. Use > + * approximate_object_count() instead. > + */ > + unsigned long approximate_object_count; > + unsigned approximate_object_count_valid : 1; Patch looks fine and is effectively a no-op, though what is the need for both of these variables? Maybe it can be simplified down to just use one? Just musing as its out of the scope of this patch and we probably shouldn't try to change that in this series. > + > /* >* Whether packed_git has already been populated with this repository's >* packs. > @@ -107,7 +115,7 @@ struct raw_object_store { > * that macro on member variables. Use NULL instead as that is defined > * and accepted, deferring the real init to prepare_packed_git_mru(). */ > #define __MRU_INIT { NULL, NULL } > -#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL, 0 } > +#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL, 0, 0, 0 } > > void raw_object_store_clear(struct raw_object_store *o); > > diff --git a/packfile.c b/packfile.c > index a8a2e7fe43..693bafbc98 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -803,8 +803,6 @@ static void prepare_packed_git_one(char *objdir, int > local) > strbuf_release(); > } > > -static int approximate_object_count_valid; > - > /* > * Give a fast, rough count of the number of objects in the repository. This > * ignores loose objects completely. If you have a lot of them, then either > @@ -814,8 +812,8 @@ static int approximate_object_count_valid; > */ > unsigned long approximate_object_count(void) > { > - static unsigned long count; > - if (!approximate_object_count_valid) { > + if (!the_repository->objects.approximate_object_count_valid) { > + unsigned long count; > struct packed_git *p; > > prepare_packed_git(); > @@ -825,8 +823,9 @@ unsigned long approximate_object_count(void) > continue; > count += p->num_objects; > } > + the_repository->objects.approximate_object_count = count; > } > - return count; > + return the_repository->objects.approximate_object_count; > } > > static void *get_next_packed_git(const void *p) > @@ -901,7 +900,7 @@ void prepare_packed_git(void) > > void reprepare_packed_git(void) > { > - approximate_object_count_valid = 0; > + the_repository->objects.approximate_object_count_valid = 0; > the_repository->objects.packed_git_initialized = 0; > prepare_packed_git(); > } > -- > 2.16.1.291.g4437f3f132-goog > -- Brandon Williams
Re: [PATCH 24/27] sha1_file: allow open_sha1_file to handle arbitrary repositories
On Tue, 20 Feb 2018 17:54:27 -0800 Stefan Bellerwrote: > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Reviewed-by: Jonathan Tan
Re: [PATCH 22/27] sha1_file: allow sha1_file_name to handle arbitrary repositories
On Tue, 20 Feb 2018 17:54:25 -0800 Stefan Bellerwrote: > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Reviewed-by: Jonathan Tan > -void sha1_file_name_the_repository(struct strbuf *buf, const unsigned char > *sha1) > +void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned > char *sha1) > { > - strbuf_addstr(buf, get_object_directory()); > + strbuf_addstr(buf, r->objects.objectdir); > strbuf_addch(buf, '/'); > fill_sha1_path(buf, sha1); > } In the future, we should probably have: - a function to get the object store out of a repo (so that it can lazily initialize the object store struct if necessary) - when the object store is obtained, its objectdir field is guaranteed to be populated - sha1_file_name should take the object store struct, not the repo struct but this is outside the scope of this patch.
Re: [PATCH 06/27] object-store: close all packs upon clearing the object store
On 02/20, Stefan Beller wrote: > Signed-off-by: Stefan BellerStraight forward change, looks good. > --- > builtin/am.c | 2 +- > builtin/clone.c| 2 +- > builtin/fetch.c| 2 +- > builtin/merge.c| 2 +- > builtin/receive-pack.c | 2 +- > object.c | 6 ++ > packfile.c | 4 ++-- > packfile.h | 2 +- > 8 files changed, 10 insertions(+), 12 deletions(-) > > diff --git a/builtin/am.c b/builtin/am.c > index 5bdd2d7578..4762a702e3 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -1859,7 +1859,7 @@ static void am_run(struct am_state *state, int resume) >*/ > if (!state->rebasing) { > am_destroy(state); > - close_all_packs(); > + close_all_packs(_repository->objects); > run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); > } > } > diff --git a/builtin/clone.c b/builtin/clone.c > index 101c27a593..13cfaa6488 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -1217,7 +1217,7 @@ int cmd_clone(int argc, const char **argv, const char > *prefix) > transport_disconnect(transport); > > if (option_dissociate) { > - close_all_packs(); > + close_all_packs(_repository->objects); > dissociate_from_references(); > } > > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 8ee998ea2e..4d72efca78 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1478,7 +1478,7 @@ int cmd_fetch(int argc, const char **argv, const char > *prefix) > > string_list_clear(, 0); > > - close_all_packs(); > + close_all_packs(_repository->objects); > > argv_array_pushl(_gc_auto, "gc", "--auto", NULL); > if (verbosity < 0) > diff --git a/builtin/merge.c b/builtin/merge.c > index 30264cfd7c..907ae44ab5 100644 > --- a/builtin/merge.c > +++ b/builtin/merge.c > @@ -411,7 +411,7 @@ static void finish(struct commit *head_commit, >* We ignore errors in 'gc --auto', since the >* user should see them. >*/ > - close_all_packs(); > + close_all_packs(_repository->objects); > run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); > } > } > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c > index b2eac79a6e..954fc72c7c 100644 > --- a/builtin/receive-pack.c > +++ b/builtin/receive-pack.c > @@ -2027,7 +2027,7 @@ int cmd_receive_pack(int argc, const char **argv, const > char *prefix) > proc.git_cmd = 1; > proc.argv = argv_gc_auto; > > - close_all_packs(); > + close_all_packs(_repository->objects); > if (!start_command()) { > if (use_sideband) > copy_to_sideband(proc.err, -1, NULL); > diff --git a/object.c b/object.c > index c76b62572a..34daaf37b3 100644 > --- a/object.c > +++ b/object.c > @@ -4,6 +4,7 @@ > #include "tree.h" > #include "commit.h" > #include "tag.h" > +#include "packfile.h" > > static struct object **obj_hash; > static int nr_objs, obj_hash_size; > @@ -469,8 +470,5 @@ void raw_object_store_clear(struct raw_object_store *o) > > while (!list_empty(>packed_git_mru)) > list_del(>packed_git_mru); > - /* > - * TODO: call close_all_packs once migrated to > - * take an object store argument > - */ > + close_all_packs(o); > } > diff --git a/packfile.c b/packfile.c > index d41e4c83d0..511a2b0cdf 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -311,11 +311,11 @@ static void close_pack(struct packed_git *p) > close_pack_index(p); > } > > -void close_all_packs(void) > +void close_all_packs(struct raw_object_store *o) > { > struct packed_git *p; > > - for (p = the_repository->objects.packed_git; p; p = p->next) > + for (p = o->packed_git; p; p = p->next) > if (p->do_not_close) > die("BUG: want to close pack marked 'do-not-close'"); > else > diff --git a/packfile.h b/packfile.h > index a7fca598d6..6a2c57045c 100644 > --- a/packfile.h > +++ b/packfile.h > @@ -63,7 +63,7 @@ extern void close_pack_index(struct packed_git *); > > extern unsigned char *use_pack(struct packed_git *, struct pack_window **, > off_t, unsigned long *); > extern void close_pack_windows(struct packed_git *); > -extern void close_all_packs(void); > +extern void close_all_packs(struct raw_object_store *o); > extern void unuse_pack(struct pack_window **); > extern void clear_delta_base_cache(void); > extern struct packed_git *add_packed_git(const char *path, size_t path_len, > int local); > -- > 2.16.1.291.g4437f3f132-goog > -- Brandon Williams
Re: [PATCH 04/27] object-store: free alt_odb_list
On 02/20, Stefan Beller wrote: > Free the memory and reset alt_odb_{list, tail} to NULL. Good to see memory leaks being avoided (well they will be on other repository objects) > > Signed-off-by: Stefan Beller> --- > object.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/object.c b/object.c > index 11d904c033..17b1da6d6b 100644 > --- a/object.c > +++ b/object.c > @@ -446,7 +446,24 @@ void clear_commit_marks_all(unsigned int flags) > } > } > > +static void free_alt_odb(struct alternate_object_database *alt) > +{ > + strbuf_release(>scratch); > + oid_array_clear(>loose_objects_cache); > +} > + > +static void free_alt_odbs(struct raw_object_store *o) > +{ > + while (o->alt_odb_list) { > + free_alt_odb(o->alt_odb_list); > + o->alt_odb_list = o->alt_odb_list->next; > + } > +} > + > void raw_object_store_clear(struct raw_object_store *o) > { > free(o->objectdir); > + > + free_alt_odbs(o); > + o->alt_odb_tail = NULL; > } > -- > 2.16.1.291.g4437f3f132-goog > -- Brandon Williams
Re: [PATCH 03/27] object-store: move alt_odb_list and alt_odb_tail to object store
On 02/20, Stefan Beller wrote: > In a process with multiple repositories open, alternates should be > associated to a single repository and not shared globally. Move > alt_odb_list and alt_odb_tail into the_repository and adjust callers > to reflect this. > > Now that the alternative object data base is per repository, we're > leaking its memory upon freeing a repository. The next patch plugs > this hole. > > No functional change intended. > > Signed-off-by: Stefan Beller> Signed-off-by: Jonathan Nieder Looks good aside from the one spacing issue (though it's not that big of a deal). > --- > builtin/fsck.c | 4 +++- > object-store.h | 9 ++--- > packfile.c | 3 ++- > sha1_file.c| 25 - > sha1_name.c| 3 ++- > 5 files changed, 25 insertions(+), 19 deletions(-) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 7a8a679d4f..908e4f092a 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -1,5 +1,6 @@ > #include "builtin.h" > #include "cache.h" > +#include "repository.h" > #include "config.h" > #include "commit.h" > #include "tree.h" > @@ -716,7 +717,8 @@ int cmd_fsck(int argc, const char **argv, const char > *prefix) > fsck_object_dir(get_object_directory()); > > prepare_alt_odb(); > - for (alt = alt_odb_list; alt; alt = alt->next) > + for (alt = the_repository->objects.alt_odb_list; > + alt; alt = alt->next) Indentation on this line looks odd. -- Brandon Williams
Re: [PATCH 02/27] object-store: migrate alternates struct and functions from cache.h
On 02/20, Stefan Beller wrote: > Migrate the struct alternate_object_database and all its related > functions to the object store as these functions are easier found in > that header. The migration is just a verbatim copy, no need to > include the object store header at any C file, because cache.h includes > repository.h which in turn includes the object-store.h Nice. Always love seeing cache.h get smaller -- Brandon Williams
Re: [PATCH 19/27] sha1_file: add repository argument to map_sha1_file_1
On Tue, 20 Feb 2018 17:54:22 -0800 Stefan Bellerwrote: > Add a repository argument to allow the map_sha1_file_1 caller to be > more specific about which repository to act on. This is a small > mechanical change; it doesn't change the implementation to handle > repositories other than the_repository yet. > > As with the previous commits, use a macro to catch callers passing a > repository other than the_repository at compile time. > > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Reviewed-by: Jonathan Tan
Re: [PATCH 15/27] sha1_file: allow prepare_alt_odb to handle arbitrary repositories
On Tue, 20 Feb 2018 17:54:18 -0800 Stefan Bellerwrote: > -void prepare_alt_odb_the_repository(void) > +void prepare_alt_odb(struct repository *r) > { > - const char *alt; > - > - if (the_repository->objects.alt_odb_tail) > + if (r->objects.alt_odb_tail) > return; > > - alt = getenv(ALTERNATE_DB_ENVIRONMENT); > + r->objects.alt_odb_tail = >objects.alt_odb_list; > + > + if (!r->ignore_env) { > + const char *alt = getenv(ALTERNATE_DB_ENVIRONMENT); > + if (!alt) > + alt = ""; alt can be NULL, just like in the existing code, so these 2 lines are unnecessary. (link_alt_odb_entries() will just do nothing in either case.) (I also think that the check of absence of alt should be done by the caller of link_alt_odb_entries(), not by link_alt_odb_entries() itself, but that is much beyond the scope of this patch set.) > > - the_repository->objects.alt_odb_tail = > - _repository->objects.alt_odb_list; > - link_alt_odb_entries(the_repository, alt, > - PATH_SEP, NULL, 0); > + link_alt_odb_entries(r, alt, PATH_SEP, NULL, 0); > + } > > - read_info_alternates(the_repository, get_object_directory(), 0); > + read_info_alternates(r, r->objects.objectdir, 0); > }
What's cooking in git.git (Feb 2018, #03; Wed, 21)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to "master"] * ab/sha1dc-build (2017-12-08) 3 commits (merged to 'next' on 2018-02-08 at ba9ff2b836) + sha1dc_git.h: re-arrange an ifdef chain for a subsequent change + Makefile: under "make dist", include the sha1collisiondetection submodule + Makefile: don't error out under DC_SHA1_EXTERNAL if DC_SHA1_SUBMODULE=auto Push the submodule version of collision-detecting SHA-1 hash implementation a bit harder on builders. * ab/wildmatch-tests (2018-01-30) 10 commits (merged to 'next' on 2018-02-08 at f999a3d732) + wildmatch test: mark test as EXPENSIVE_ON_WINDOWS + test-lib: add an EXPENSIVE_ON_WINDOWS prerequisite + wildmatch test: create & test files on disk in addition to in-memory + wildmatch test: perform all tests under all wildmatch() modes + wildmatch test: use test_must_fail, not ! for test-wildmatch + wildmatch test: remove dead fnmatch() test code + wildmatch test: use a paranoia pattern from nul_match() + wildmatch test: don't try to vertically align our output + wildmatch test: use more standard shell style + wildmatch test: indent with tabs, not spaces More tests for wildmatch functions. * bc/hash-algo (2018-02-09) 13 commits (merged to 'next' on 2018-02-09 at 4437f3f132) + hash: update obsolete reference to SHA1_HEADER (merged to 'next' on 2018-02-08 at 18f36d12ed) + bulk-checkin: abstract SHA-1 usage + csum-file: abstract uses of SHA-1 + csum-file: rename sha1file to hashfile + read-cache: abstract away uses of SHA-1 + pack-write: switch various SHA-1 values to abstract forms + pack-check: convert various uses of SHA-1 to abstract forms + fast-import: switch various uses of SHA-1 to the_hash_algo + sha1_file: switch uses of SHA-1 to the_hash_algo + builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo + builtin/index-pack: improve hash function abstraction + hash: create union for hash context allocation + hash: move SHA-1 macros to hash.h More abstraction of hash function from the codepath. * cc/perf-aggregate (2018-02-02) 3 commits (merged to 'next' on 2018-02-08 at d8f074e6fb) + perf/aggregate: sort JSON fields in output + perf/aggregate: add --reponame option + perf/aggregate: add --subsection option "make perf" enhancement. * en/merge-recursive-fixes (2018-01-19) 3 commits (merged to 'next' on 2018-02-08 at c254292070) + merge-recursive: add explanation for src_entry and dst_entry + merge-recursive: fix logic ordering issue + Tighten and correct a few testcases for merging and cherry-picking (this branch is used by en/rename-directory-detection.) * gs/rebase-allow-empty-message (2018-02-07) 1 commit (merged to 'next' on 2018-02-08 at 9d81a2496c) + rebase: add --allow-empty-message option "git rebase" learned to take "--allow-empty-message" option. * jc/worktree-add-short-help (2018-01-17) 1 commit (merged to 'next' on 2018-02-08 at 9f59ca72ab) + worktree: say that "add" takes an arbitrary commit in short-help Error message fix. * jt/fsck-code-cleanup (2018-01-23) 1 commit (merged to 'next' on 2018-02-08 at 199ad41486) + fsck: fix leak when traversing trees Plug recently introduced leaks in fsck. * kg/packed-ref-cache-fix (2018-01-24) 6 commits (merged to 'next' on 2018-02-08 at 370f06a565) + packed_ref_cache: don't use mmap() for small files + load_contents(): don't try to mmap an empty file + packed_ref_iterator_begin(): make optimization more general + find_reference_location(): make function safe for empty snapshots + create_snapshot(): use `xmemdupz()` rather than a strbuf + struct snapshot: store `start` rather than `header_len` Avoid mmapping small files while using packed refs (especially ones with zero size, which would cause later munmap() to fail). A change to a binsearch loop to work around picky complers was unnecessarily hard to reason about, but it should do. * lw/daemon-log-destination (2018-02-05) 1 commit (merged to 'next' on 2018-02-08 at da91bd56f4) + daemon: add --log-destination=(stderr|syslog|none) The log from "git daemon" can be redirected with a new option; one relevant use case is to send the log to standard error (instead of syslog) when running it from inetd. * nd/format-patch-stat-width (2018-02-02) 2 commits (merged to 'next' on 2018-02-08 at c03e8a084e) + format-patch: reduce patch diffstat width to 72 + format-patch: keep cover-letter diffstat wrapped in 72 columns "git format-patch" learned to give 72-cols to diffstat, which is consistent with other
Re: [PATCH 09/27] sha1_file: add raw_object_store argument to alt_odb_usable
On Tue, 20 Feb 2018 17:54:12 -0800 Stefan Bellerwrote: > Add a raw_object_store to alt_odb_usable to be more specific about which > repository to act on. The choice of the repository is delegated to its > only caller link_alt_odb_entry. > > Signed-off-by: Stefan Beller I checked that alt_odb_usable no longer depends on any repository-like globals. Reviewed-by: Jonathan Tan
Re: [PATCHv3 00/27] Moving global state into the repository object (part 1)
On Tue, Feb 20, 2018 at 5:54 PM, Stefan Bellerwrote: > v3: > * reverted back to use the repository for most of the functions > (including unduplicating 'ignore_env') > * rebased on master again (I lost that state when doing v2, as > I did both rebase as well as conversion to object store in one go) > * one additional patch, that moves the alternates related things to > object-store.h, such that inclusion of cache.h (in object-store.h) > is not required any more. This is also available at https://github.com/stefanbeller/git/tree/object-store-part1-v3
Re: [PATCH 00/36] object_id part 12
On Wed, Feb 21, 2018 at 10:47:19AM -0800, Junio C Hamano wrote: > "brian m. carlson"writes: > > > This is the twelfth in a series of patches to convert from unsigned char > > [20] to struct object_id. This series is based on next. > > > > Included in this series are conversions for find_unique_abbrev and > > lookup_replace_object, as well as parts of the sha1_file code. > > > > Conflicts with pu are average in number but minor, mostly because of the > > type_name conversion. None of them are tricky, except that the > > introduction of get_tree_entry_if_blob requires a conversion of that > > function. > > And the reason why this is based on 'next' is...? Which topic(s) do > we have to wait for until we can queue this series, in other words? > > Thanks for working on this, though. It was waiting on the hash_algo changes I had submitted, and I don't believe they'd made it into master. When they have, I'll rebase and send a v2. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v5 00/17] document & test fetch pruning & add fetch.pruneTags
Ævar Arnfjörð Bjarmasonwrites: > Here's a v5 (correct subject line this time!). Many thanks to Eric for > a thorough review. We haven't seen any comments on this round. Is everybody happy? I do not have a strong opinion on the new feature, either for or against. I didn't find anything majorly questionable in the execution, though, so...
Re: [PATCH v3 0/3] Add "git rebase --show-current-patch"
Nguyễn Thái Ngọc Duywrites: > Compared to v2: > > - the potential loss of errno before it's printed out in builtin/am.c > is fixed. > - keep update_ref() in sequencer.c non-fatal like this rest > - rename ORIG_COMMIT to REBASE_HEAD > > Interdiff: This round hasn't seen any comments. Is everybody happy with it? I personally do not have strong opinion for the feature but didn't spot anything against the execution, either, so...
Re: [PATCH 07/27] pack: move prepare_packed_git_run_once to object store
On Tue, 20 Feb 2018 17:54:10 -0800 Stefan Bellerwrote: > Each repository's object store can be initialized independently, so > they must not share a run_once variable. > > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Reviewed-by: Jonathan Tan > -#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL } > +#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL, 0 } Optional: Trailing zeros are not needed in struct initializers.
Re: [PATCH v3 35/35] remote-curl: don't request v2 when pushing
On Tue, 6 Feb 2018 17:13:12 -0800 Brandon Williamswrote: > +test_expect_success 'push with http:// and a config of v2 does not request > v2' ' > + # Till v2 for push is designed, make sure that if a client has > + # protocol.version configured to use v2, that the client instead falls > + # back and uses v0. > + > + test_commit -C http_child three && > + > + # Push to another branch, as the target repository has the > + # master branch checked out and we cannot push into it. > + GIT_TRACE_PACKET=1 git -C http_child -c protocol.version=1 \ > + push origin HEAD:client_branch && 2>log && Should it be protocol.version=2? Also, two double ampersands? Also, optionally, it might be better to do GIT_TRACE_PACKET="$(pwd)/log", so that it does not get mixed with other stderr output.
Re: [PATCH v3 32/35] http: allow providing extra headers for http requests
On Tue, 6 Feb 2018 17:13:09 -0800 Brandon Williamswrote: > @@ -172,6 +172,8 @@ struct http_get_options { >* for details. >*/ > struct strbuf *base_url; > + > + struct string_list *extra_headers; Document this? For example: If not NULL, additional HTTP headers to be sent with the request. The strings in the list must not be freed until after the request.
Re: [PATCH v3 30/35] remote-curl: create copy of the service name
On Tue, 6 Feb 2018 17:13:07 -0800 Brandon Williamswrote: > Make a copy of the service name being requested instead of relying on > the buffer pointed to by the passed in 'const char *' to remain > unchanged. > > Signed-off-by: Brandon Williams Probably worth mentioning in the commit message: Currently, all service names are string constants, but a subsequent patch will introduce service names from external sources. Other than that, Reviewed-by: Jonathan Tan
Re: [PATCH v3 28/35] transport-helper: introduce stateless-connect
On Tue, 6 Feb 2018 17:13:05 -0800 Brandon Williamswrote: > Introduce the transport-helper capability 'stateless-connect'. This > capability indicates that the transport-helper can be requested to run > the 'stateless-connect' command which should attempt to make a > stateless connection with a remote end. Once established, the > connection can be used by the git client to communicate with > the remote end natively in a stateless-rpc manner as supported by > protocol v2. This means that the client must send everything the server > needs in a single request as the client must not assume any > state-storing on the part of the server or transport. Maybe it's worth mentioning that support in the actual remote helpers will be added in a subsequent patch. > If a stateless connection cannot be established then the remote-helper > will respond in the same manner as the 'connect' command indicating that > the client should fallback to using the dumb remote-helper commands. This makes sense, but there doesn't seem to be any code in this patch that implements this. > @@ -612,6 +615,11 @@ static int process_connect_service(struct transport > *transport, > if (data->connect) { > strbuf_addf(, "connect %s\n", name); > ret = run_connect(transport, ); > + } else if (data->stateless_connect) { > + strbuf_addf(, "stateless-connect %s\n", name); > + ret = run_connect(transport, ); > + if (ret) > + transport->stateless_rpc = 1; Why is process_connect_service() falling back to stateless_connect if connect doesn't work? I don't think this fallback would work, as a client that needs "connect" might need its full capabilities.
RE: Git should preserve modification times at least on request
On February 21, 2018 6:13 PM, Peter Backes wrote: > On Wed, Feb 21, 2018 at 11:44:13PM +0100, Ævar Arnfjörð Bjarmason wrote: > > If it were added as a first-level feature to git it would present a > > lot of UX confusion. E.g. you run "git add" and it'll be showing the > > mtime somehow, or you get a formatted patch over E-Mail and it doesn't > > only include the commit time but also times for individual files. > > But that's pretty standard. patch format has timestamp fields for precisely > this purpose: > > % echo a > x > % echo b > y > % diff -u x y > --- x 2018-02-21 23:56:29.574029523 +0100 > +++ y 2018-02-21 23:56:31.430003389 +0100 May I suggest storing the date/time in UTC+0 in all cases. I can see potential issues a couple of times a year where holes exist. I cannot even fathom what would happen on a merge or edit of history. Cheers, Randall
Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order) 2
When using git log, boundary commits (ie, those commits added by specifying --boundary) do not respect the order (e.g., --date-order, --topo-order). Consider the following commit history, where number indicates the order of the commit timestamps: 0125 <--A \ \ 346 <--B Executing the following command: $ git log --boundary --date-order ^A B Should produce the following order (boundary commits shown with dashes): 6 -5 4 3 -1 However, it in fact produces: 6 4 3 -5 -1 Please advise. Best, ~Josh
Re: Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order)
TYPO IN EXPECTED OUTPUT. To avoid inevitable confusion, creating new thread "Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order) 2". DON'T REPLY TO THIS MESSAGE. Instead reply to the new message On Wed, Feb 21, 2018 at 6:28 PM, Josh Tepperwrote: > When using git log, boundary commits (ie, those commits added by > specifying --boundary) do not respect the order (e.g., --date-order, > --topo-order). Consider the following commit history, where number > indicates the order of the commit timestamps: > > > 0125 <--A >\ \ > 346 <--B > > > Executing the following command: > > $ git log --boundary --date-order ^A B > > Should produce the following order (boundary commits shown with dashes): > 6 -5 4 3 -1 > > However, it in fact produces: > 6 4 3 -7 -1 > > Please advise. > > Best, > ~Josh
Re: [PATCH v3 24/35] connect: refactor git_connect to only get the protocol version once
On Tue, 6 Feb 2018 17:13:01 -0800 Brandon Williamswrote: > Instead of having each builtin transport asking for which protocol > version the user has configured in 'protocol.version' by calling > `get_protocol_version_config()` multiple times, factor this logic out > so there is just a single call at the beginning of `git_connect()`. > > This will be helpful in the next patch where we can have centralized > logic which determines if we need to request a different protocol > version than what the user has configured. > > Signed-off-by: Brandon Williams Reviewed-by: Jonathan Tan
Re: bug in HTTP protocol spec
> On Feb 21, 2018, at 2:15 PM, Jeff Kingwrote: > > Thanks, I agree the document is buggy. Do you want to submit a patch? Will this do? Note I am not sure what the story is behind that `version 1` element, whether it's supposed to go before or after the null packet or if there should be another null packet or what. Perhaps somebody more fluent with the smart protocol can advise. --- Documentation/technical/http-protocol.txt | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt index a0e45f2889e6e..19d73f7efb338 100644 --- a/Documentation/technical/http-protocol.txt +++ b/Documentation/technical/http-protocol.txt @@ -214,14 +214,17 @@ smart server reply: S: Cache-Control: no-cache S: S: 001e# service=git-upload-pack\n + S: S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n + S: The client may send Extra Parameters (see Documentation/technical/pack-protocol.txt) as a colon-separated string -in the Git-Protocol HTTP header. +in the Git-Protocol HTTP header. Note as well that there is *no* newline +after the ``. Dumb Server Response @@ -264,8 +267,8 @@ Servers MUST set $servicename to be the request parameter value. Servers SHOULD include an LF at the end of this line. Clients MUST ignore an LF at the end of the line. -Servers MUST terminate the response with the magic `` end -pkt-line marker. +Servers MUST follow the first pkt-line, as well as terminate the +response, with the magic `` end pkt-line marker. The returned response is a pkt-line stream describing each ref and its known value. The stream SHOULD be sorted by name according to @@ -278,6 +281,7 @@ Extra Parameter. smart_reply = PKT-LINE("# service=$servicename" LF) *1("version 1") +"" ref_list "" ref_list= empty_list / non_empty_list -- Dorian Taylor Make things. Make sense. https://doriantaylor.com
Re: [PATCH v3 20/35] upload-pack: introduce fetch server command
On Tue, 6 Feb 2018 17:12:57 -0800 Brandon Williamswrote: > +want > + Indicates to the server an object which the client wants to > + retrieve. Mention that the client can "want" anything even if not advertised by the server (like uploadpack.allowanysha1inwant). > +output = *section > +section = (acknowledgments | packfile) > + (flush-pkt | delim-pkt) > + > +acknowledgments = PKT-LINE("acknowledgments" LF) > + *(ready | nak | ack) Can this part be described more precisely in the BNF section? I see that you describe later that there can be multiple ACKs or one NAK (but not both), and "ready" can be sent regardless of whether ACKs or a NAK is sent. > +ready = PKT-LINE("ready" LF) > +nak = PKT-LINE("NAK" LF) > +ack = PKT-LINE("ACK" SP obj-id LF) > + > +packfile = PKT-LINE("packfile" LF) > +[PACKFILE] > + > + > +acknowledgments section > + * Always begins with the section header "acknowledgments" > + > + * The server will respond with "NAK" if none of the object ids sent > + as have lines were common. > + > + * The server will respond with "ACK obj-id" for all of the > + object ids sent as have lines which are common. > + > + * A response cannot have both "ACK" lines as well as a "NAK" > + line. > + > + * The server will respond with a "ready" line indicating that > + the server has found an acceptable common base and is ready to > + make and send a packfile (which will be found in the packfile > + section of the same response) > + > + * If the client determines that it is finished with negotiations > + by sending a "done" line, the acknowledgments sections can be > + omitted from the server's response as an optimization. Should this be changed to "must"? The current implementation does not support it (on the client side). > +#define UPLOAD_PACK_DATA_INIT { OBJECT_ARRAY_INIT, OID_ARRAY_INIT, 0, 0, 0, > 0, 0, 0 } Optional: the trailing zeroes can be omitted. (That's shorter, and also easier to maintain when we add new fields.) > +int upload_pack_v2(struct repository *r, struct argv_array *keys, > +struct argv_array *args) > +{ > + enum fetch_state state = FETCH_PROCESS_ARGS; > + struct upload_pack_data data = UPLOAD_PACK_DATA_INIT; > + use_sideband = LARGE_PACKET_MAX; > + > + while (state != FETCH_DONE) { > + switch (state) { > + case FETCH_PROCESS_ARGS: > + process_args(args, ); > + > + if (!want_obj.nr) { > + /* > + * Request didn't contain any 'want' lines, > + * guess they didn't want anything. > + */ > + state = FETCH_DONE; > + } else if (data.haves.nr) { > + /* > + * Request had 'have' lines, so lets ACK them. > + */ > + state = FETCH_SEND_ACKS; > + } else { > + /* > + * Request had 'want's but no 'have's so we can > + * immedietly go to construct and send a pack. > + */ > + state = FETCH_SEND_PACK; > + } > + break; > + case FETCH_READ_HAVES: > + read_haves(); > + state = FETCH_SEND_ACKS; > + break; This branch seems to never be taken? > + case FETCH_SEND_ACKS: > + if (process_haves_and_send_acks()) > + state = FETCH_SEND_PACK; > + else > + state = FETCH_DONE; > + break; > + case FETCH_SEND_PACK: > + packet_write_fmt(1, "packfile\n"); > + create_pack_file(); > + state = FETCH_DONE; > + break; > + case FETCH_DONE: > + continue; > + } > + } > + > + upload_pack_data_clear(); > + return 0; > +}
Re: [PATCH] revision: drop --show-all option
On Wed, Feb 21, 2018 at 3:27 PM, Jeff Kingwrote: > > We'll skip the usual deprecation period because this was > explicitly a debugging aid that was never documented. Ack. I don't think I've used it since, and probably nobody else ever used it. Linus
Re: [PATCH 04/26] upload-pack: convert to a builtin
Jonathan Niederwrites: > For defense in depth, it would be comforting if the git wrapper had > some understanding of "don't support --help in handle_builtin when > invoked as a dashed command". That is, I don't expect that anyone has > been relying on > > git-add --help > > acting like > > git help add > > instead of printing the usage message from > > git add -h Sounds like a neat trick. > It's a little fussy because today we rewrite "git add --help" to > "git-add --help" before rewriting it to "git help add"; we'd have to > skip that middle hop for this to work. I do not quite get this part. "git add --help" goes through run_argv() and then to handle_builtin() which is what does this "git help add" swapping. "git-add --help" does get thrown into the same codepath by pretending as if we got "add --help" as an argument to "git" command, and that happens without going through run_argv(), so presumably we can add another perameter to handle_builtin() so that the callee can tell these two invocation sites apart, no? > I don't think that has to block this patch or series, though --- it's > just a separate thought about hardening. Yeah, I agree with this assessment.
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 02:19:17PM -0500, Derrick Stolee wrote: > > These behaviors are undocumented, untested, and unlikely to be > > expected by users or other software attempting to parse this output. > > > > Helped-by: Jeff King> > Signed-off-by: Derrick Stolee > > This would be a good time to allow multiple authors, or to just change the > author, since this is exactly the diff you (Peff) provided in an earlier > email. The commit message hopefully summarizes our discussion, but I welcome > edits. The point is moot if we take the revision I just sent (though in retrospect I really ought to have put in a Reported-by: for you there). But some communities are settling on Co-authored-by as a trailer for this case. And GitHub has started parsing and showing that along with author information: https://github.com/blog/2496-commit-together-with-co-authors -Peff
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 03:22:02PM -0800, Stefan Beller wrote: > > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > > --- > > builtin/rev-list.c | 2 +- > > log-tree.c | 3 --- > > 2 files changed, 1 insertion(+), 4 deletions(-) > > Now if we'd get around to rewrite pretty.c as well, we could make it static, > giving a stronger reason of not using that function. But it looks a bit > complicated to me, who is not familiar in that area of the code. > > Thanks for making less use of this suboptimal API, I'm not sure the API is suboptimal. It's not wrong to ask "do you have a cached copy of this?". It was just being used poorly here. :) See the discussion in https://public-inbox.org/git/20180221184811.gd4...@sigill.intra.peff.net/ -Peff
Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order)
When using git log, boundary commits (ie, those commits added by specifying --boundary) do not respect the order (e.g., --date-order, --topo-order). Consider the following commit history, where number indicates the order of the commit timestamps: 0125 <--A \ \ 346 <--B Executing the following command: $ git log --boundary --date-order ^A B Should produce the following order (boundary commits shown with dashes): 6 -5 4 3 -1 However, it in fact produces: 6 4 3 -7 -1 Please advise. Best, ~Josh
[PATCH] revision: drop --show-all option
On Wed, Feb 21, 2018 at 03:03:18PM -0800, Junio C Hamano wrote: > > So what I'm wondering is whether we should consider just ripping it out > > (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, > > it's probably not hurting anybody). > > I see no problem in removing it. With more "interesting" features > relying on post-processing (like 'simplify-merges'), show_all whose > primary focus was how limit_list() behaves soft of outlived its > usefulness, I would think. So here's a patch to do that (textually independent but conceptually on top of the earlier one to fix the commit-buffer thing). It actually doesn't remove that much code, so I could go either way on it. +cc Linus in case he's secretly in love with this feature. -- >8 -- Subject: [PATCH] revision: drop --show-all option This was an undocumented debugging aid that does not seem to have come in handy in the past decade, judging from its lack of mentions on the mailing list. Let's drop it in the name of simplicity. This is morally a revert of 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09), but note that I did leave in the mapping of UNINTERESTING to "^" in get_revision_mark(). I don't think this would be possible to trigger with the current code, but it's the only sensible marker. We'll skip the usual deprecation period because this was explicitly a debugging aid that was never documented. Signed-off-by: Jeff King--- revision.c | 10 - revision.h | 1 - t/t6015-rev-list-show-all-parents.sh | 31 3 files changed, 42 deletions(-) delete mode 100755 t/t6015-rev-list-show-all-parents.sh diff --git a/revision.c b/revision.c index 5ce9b93baa..5c1cb7277c 100644 --- a/revision.c +++ b/revision.c @@ -1065,14 +1065,9 @@ static int limit_list(struct rev_info *revs) return -1; if (obj->flags & UNINTERESTING) { mark_parents_uninteresting(commit); - if (revs->show_all) - p = _list_insert(commit, p)->next; slop = still_interesting(list, date, slop, _cache); if (slop) continue; - /* If showing all, add the whole pending list to the end */ - if (revs->show_all) - *p = list; break; } if (revs->min_age != -1 && (commit->date > revs->min_age)) @@ -1864,8 +1859,6 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->dense = 1; } else if (!strcmp(arg, "--sparse")) { revs->dense = 0; - } else if (!strcmp(arg, "--show-all")) { - revs->show_all = 1; } else if (!strcmp(arg, "--in-commit-order")) { revs->tree_blobs_in_commit_order = 1; } else if (!strcmp(arg, "--remove-empty")) { @@ -3094,8 +3087,6 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi return commit_ignore; if (revs->unpacked && has_sha1_pack(commit->object.oid.hash)) return commit_ignore; - if (revs->show_all) - return commit_show; if (commit->object.flags & UNINTERESTING) return commit_ignore; if (revs->min_age != -1 && @@ -3194,7 +3185,6 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit) enum commit_action action = get_commit_action(revs, commit); if (action == commit_show && - !revs->show_all && revs->prune && revs->dense && want_ancestry(revs)) { /* * --full-diff on simplified parents is no good: it diff --git a/revision.h b/revision.h index 3dee97bfb9..b8c47b98e2 100644 --- a/revision.h +++ b/revision.h @@ -90,7 +90,6 @@ struct rev_info { unsigned intdense:1, prune:1, no_walk:2, - show_all:1, remove_empty_trees:1, simplify_history:1, topo_order:1, diff --git a/t/t6015-rev-list-show-all-parents.sh b/t/t6015-rev-list-show-all-parents.sh deleted file mode 100755 index 3c73c93ba6..00 --- a/t/t6015-rev-list-show-all-parents.sh +++ /dev/null @@ -1,31 +0,0 @@ -#!/bin/sh - -test_description='--show-all --parents does not rewrite TREESAME commits' - -. ./test-lib.sh - -test_expect_success 'set up --show-all --parents test' ' - test_commit one foo.txt && - commit1=$(git rev-list -1 HEAD) && - test_commit two bar.txt && - commit2=$(git rev-list -1 HEAD) && - test_commit three foo.txt && - commit3=$(git rev-list -1 HEAD) - ' - -test_expect_success '--parents rewrites
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
> Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > --- > builtin/rev-list.c | 2 +- > log-tree.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) Now if we'd get around to rewrite pretty.c as well, we could make it static, giving a stronger reason of not using that function. But it looks a bit complicated to me, who is not familiar in that area of the code. Thanks for making less use of this suboptimal API, Stefan
Re: Git should preserve modification times at least on request
On Wed, Feb 21, 2018 at 11:44:13PM +0100, Ævar Arnfjörð Bjarmason wrote: > If it were added as a first-level feature to git it would present a lot > of UX confusion. E.g. you run "git add" and it'll be showing the mtime > somehow, or you get a formatted patch over E-Mail and it doesn't only > include the commit time but also times for individual files. But that's pretty standard. patch format has timestamp fields for precisely this purpose: % echo a > x % echo b > y % diff -u x y --- x 2018-02-21 23:56:29.574029523 +0100 +++ y 2018-02-21 23:56:31.430003389 +0100 @@ -1 +1 @@ -a +b At present, git simply leaves those fields blank... > The VC systems that had this feature in the past were centralized, so > they could (in theory anyway) ensure that timestamps were monotonically > increasing. This won't be the case with git, we have plenty of timestamp > drift in e.g. linux.git and other git repos. I don't see where monotonicity would be an issue any more than it is for centralized version control systems. Even in the centralized setting, monotonicity is not guaranteed, since you might have local timestamps deviating from the repository; you might have added a line, compiled, and removed it again later on, without running make again. Now if you checkout changes from the repository, and it sets the timestamp, that timestamp might be older than before the compile, and the file would not be rebuilt if you run make. So you cannot avoid those issues in centralized setttings either. > So if these mtimes were used by default they'd interact badly with stuff > like "make" in those cases, because you might check out a modified > version with a timestamp in the past. That's very clearly the case, and I have stressed in my initial email that I fully agree with the reasoning of the FAQ in this regard. It is, however, merely an argument against *restoring* the timestamps *by default*, to comply with the principle of least astonishment. It is, by itself, not an argument against *storing* the timestamps, let alone against restoring them *on request*. For the initial checkout, it should not even be harmful to restore the timestamps by default. > any case, I just wanted to point out a workaround (but then digressed > into critiquing the idea above...). Well, Johannes's proposed solution seems pretty reasonable and realistic to me. Thanks to Phillip's hint about unquote_path() in Git.pm it seems I now have all the needed ingredients to implement this feature. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: > The get_cached_commit_buffer() method provides access to the buffer > loaded for a struct commit, if it was ever loadead and was not freed. > > Two places use this to inform how to output information about commits. > > log-tree.c uses this method to short-circuit the output of commit > information when the buffer is not cached. However, this leads to > incorrect output in 'git log --oneline' where the short-OID is written > but then the rest of the commit information is dropped and the next > commit is written on the same line. > > rev-list uses this method for two reasons: > > - First, if the revision walk visits a commit twice, the buffer was > freed by rev-list in the first write. The output then does not > match the format expectations, since the OID is written without the > rest of the content. I'm not sure after my earlier digging if there is even a way to trigger this (and if so, it is probably accidental, since those lines were added explicitly for --show-all). And actually after re-reading the commit message for 3131b7130 again, I think the current behavior is definitely not something that was carefully planned. So I'd propose a commit message like below. -- >8 -- Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() The "--show-all" revision option shows UNINTERESTING commits. Some of these commits may be unparsed when we try to show them (since we may or may not need to walk their parents to fulfill the request). Commit 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09) resolved this by just skipping pretty-printing for commits without their object contents cached, saying: Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That was the easy fix to avoid the pretty-printer segfaulting, but: 1. It doesn't work for all formats. E.g., --oneline prints the oid for each such commit but not a trailing newline, leading to jumbled output. 2. It only affects some commits, depending on whether we happened to parse them or not (so if they were at the tip of an UNINTERESTING starting point, or if we happened to traverse over them, you'd see more data). 3. It unncessarily ties the decision to show the verbose header to whether the commit buffer was cached. That makes it harder to change the logic around caching (e.g., if we could traverse without actually loading the full commit objects). These days it's safe to feed such a commit to the pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), we'll load it on demand in such a case. So let's just always show the verbose headers. This does change the behavior of plumbing, but: a. The --show-all option was explicitly introduced as a debugging aid, and was never documented (and has rarely even been mentioned on the list by git devs). b. Avoiding the commits was already not deterministic due to (2) above. So the caller might have seen full headers for these commits anyway, and would need to be prepared for it. Signed-off-by: Jeff King--- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9e11..d320b6f1e3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d6d1..22b2fb6c58 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.16.2.555.g885a024879
Investment Project
Hello, I have a business proposal i would like to discuss with you, reply if you are interested for more info. Thank You. Ms. Melisa Mehmet My Skype ID: ms.melisameh...@hotmail.com WhatsApp": +90 534 716 2603
Re: Question about get_cached_commit_buffer()
Jeff Kingwrites: > Out of curiosity, do you actually use --show-all for anything? Absolutely not. I'd actually love it if I could say "not anymore" instead, but I haven't had an opportunity to debug the revision traversal code for quite some time so I do not even remember when was the last time I used it, which disqualifies me from saying even that. > So what I'm wondering is whether we should consider just ripping it out > (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, > it's probably not hurting anybody). I see no problem in removing it. With more "interesting" features relying on post-processing (like 'simplify-merges'), show_all whose primary focus was how limit_list() behaves soft of outlived its usefulness, I would think.
Re: [PATCH v3 16/35] transport: convert transport_get_remote_refs to take a list of ref patterns
On Tue, 6 Feb 2018 17:12:53 -0800 Brandon Williamswrote: > -const struct ref *transport_get_remote_refs(struct transport *transport) > +const struct ref *transport_get_remote_refs(struct transport *transport, > + const struct argv_array > *ref_patterns) > { > if (!transport->got_remote_refs) { > - transport->remote_refs = > transport->vtable->get_refs_list(transport, 0, NULL); > + transport->remote_refs = > + transport->vtable->get_refs_list(transport, 0, > + ref_patterns); > transport->got_remote_refs = 1; > } Should we do our own client-side filtering if the server side cannot do it for us (because it doesn't support protocol v2)? Either way, this decision should be mentioned in the commit message.
Re: [PATCH v3 15/35] transport: convert get_refs_list to take a list of ref patterns
On Tue, 6 Feb 2018 17:12:52 -0800 Brandon Williamswrote: > @@ -21,7 +22,8 @@ struct transport_vtable { >* the ref without a huge amount of effort, it should store it >* in the ref's old_sha1 field; otherwise it should be all 0. >**/ > - struct ref *(*get_refs_list)(struct transport *transport, int for_push); > + struct ref *(*get_refs_list)(struct transport *transport, int for_push, > + const struct argv_array *ref_patterns); Also mention in the documentation that this function is allowed to return refs that do not match the ref patterns.
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Re: [PATCH v3 14/35] connect: request remote refs using v2
On Tue, 6 Feb 2018 17:12:51 -0800 Brandon Williamswrote: > +extern struct ref **get_remote_refs(int fd_out, struct packet_reader *reader, > + struct ref **list, int for_push, > + const struct argv_array *ref_patterns); I haven't looked at the rest of this patch in detail, but the type of ref_patterns is probably better as struct string_list, since this is not a true argument array (e.g. with flags starting with --). Same comment for the next few patches that deal with ref patterns.
Re: test bare repository for unit tests
On Wed, Feb 21 2018, Basin Ilya jotted: > Hi. > I want to the test-repo-git under > https://github.com/apache/maven-wagon/tree/master/wagon-providers/wagon-scm/src/test/resources/ > just like test-repo-cvs and test-repo-svn > > Which configuation options would suit that? > I think core.compression 0 for human readable diffs. > also, I need to force loose, gc after each push. It looks like you have unit tests that are going to do integration tests of some SVN/CVS repos as used by some other tool, and want to add git to that. Since you have git already, the most straightforward thing to do would be to ship the output of git-fast-export in the repo, and have the test setup code create a repo locally out of that, then test it. Or do you really need to commit the raw repo files as-is for some reason I've missed?
Re: Question about get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 02:22:05PM -0800, Junio C Hamano wrote: > > I think that repeating the oid is intentional; the point is to dump how > > the traversal code is hitting the endpoints, even if we do so multiple > > times. > > > > The --oneline behavior just looks like a bug. I think --format is broken > > with --show-all, too (it does not show anything!). > > I do not know about the --format thing,[...] Hmm, maybe it is fine. I thought before that I got funny output out of "git log --show-all --format", but I can't seem to reproduce it now. > being a bug is correct. I've known about the oneline that does not > show anything other than the oid (not even end-of-line) for unparsed > commits for a long time---I just didn't bother looking into fixing > it exactly because this is only a debugging aid ;-) Out of curiosity, do you actually use --show-all for anything? I didn't even know it existed until this thread, and AFAICT nobody but Linus has ever recommended its use. And it does not even seem that useful as a general debugging aid. So what I'm wondering is whether we should consider just ripping it out (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, it's probably not hurting anybody). -Peff
Re: [PATCH v3 12/35] serve: introduce git-serve
On Tue, 6 Feb 2018 17:12:49 -0800 Brandon Williamswrote: > .gitignore | 1 + > Documentation/technical/protocol-v2.txt | 114 +++ > Makefile| 2 + > builtin.h | 1 + > builtin/serve.c | 30 > git.c | 1 + > serve.c | 250 > > serve.h | 15 ++ > t/t5701-git-serve.sh| 60 > 9 files changed, 474 insertions(+) > create mode 100644 Documentation/technical/protocol-v2.txt > create mode 100644 builtin/serve.c > create mode 100644 serve.c > create mode 100644 serve.h > create mode 100755 t/t5701-git-serve.sh As someone who is implementing the server side of protocol V2 in JGit, I now have a bit more insight into this :-) First of all, I used to not have a strong opinion on the existence of a new endpoint, but now I think that it's better to *not* have git-serve. As it is, as far as I can tell, upload-pack also needs to support (and does support, as of the end of this patch set) protocol v2 anyway, so it might be better to merely upgrade upload-pack. > +A client then responds to select the command it wants with any particular > +capabilities or arguments. There is then an optional section where the > +client can provide any command specific parameters or queries. > + > +command-request = command > + capability-list > + (command-args) If you are stating that this is optional, write "*1command-args". (RFC 5234 also supports square brackets, but "*1" is already used in pack-protocol.txt and http-protocol.txt.) > + flush-pkt > +command = PKT-LINE("command=" key LF) > +command-args = delim-pkt > +*arg > +arg = 1*CHAR arg should be wrapped in PKT-LINE, I think, and terminated by an LF.
Re: Git should preserve modification times at least on request
On Wed, Feb 21 2018, Peter Backes jotted: > On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote: >> This sounds like a sensible job for a git import tool, i.e. import a >> target directory into git, and instead of 'git add'-ing the whole thing >> it would look at the mtimes, sort files by mtime, then add them in order >> and only commit those files that had the same mtime in the same commit >> (or within some boundary). > > I think that this would be The Wrong Thing to do. I'm merely pointing out that if you have the use-case Derek Fawcus describes you can get per-file mtimes via something similar to the the hook method Theodore Ts'o described today with a simple import tool with no changes to git or its object format required. To the extent that there's a convention for this in git that's the convention, e.g. if you use github or gitlab they'll render the modification time of a file in the tree view, and that time is the time of the commit that last touched it: https://github.com/git/git > The commit time is just that: The time the commit was done. The commit > is an atomic group of changes to a number of files that hopefully bring > the tree from one usable state into the next. > > The mtime, in contrast, tells us when a file was most recently modified. > > It may well be that main.c was most recently modified yesterday, and > feature.c was modified this morning, and that only both changes taken > together make sense as a commit, despite the long time in between. > > Even worse, it may be that feature A took a long time to implement, so > we have huge gaps in between the mtimes, but feature B was quickly done > after A was finished. ... > Such an algorithm would probably split feature A > incorrectly into several commits, and group the more recently changed > files of feature A with those of feature B. Right, but that's a trade-off you can pick at import time in this hypothetical tar-to-commits tool, you could decide to do no merging and suffer to signal loss. > And if Feature A and Feature B were developed in parallel, things get > completely messy. > >> The advantage of doing this via such a tool is that you could tweak it >> to commit by any criteria you wanted, e.g. not mtime but ctime or even >> atime. > > Maybe, but it would be rather useless to commit by ctime or atime. You > do one grep -r and the atime is different. You do one chmod or chown > and the ctime is different. Those timestamps are really only useful for > very limited purposes. > > That ctime exists seems reasonable, since it's only ever updated when > the inode is written anyway. > > atime, in contrast, was clearly one of the rather nonsensical > innovations of UNIX: Do one write to the disk for each read from the > disk. C'mon, really? It would have been a lot more reasonable to simply > provide a generic way for tracing read() system calls instead; then > userspace could decide what to do with that information and which of it > is useful and should be kept and perhaps stored on disk. Now we have > this ugly hack called relatime to deal with the problem. Yes, that [ac]time example was a stretch. A better example would be committing the file mode, or extended attributes, or "this is on a different FS", or whatever other per-file/dir attribute we're not currently capturing. >> You'd get the same thing as you'd get if git's tree format would change >> to include mtimes (which isn't going to happen), but with a lot more >> flexibility. > > Well, from basic logic, I don't see how a decision not to implement a > feature could possibly increase flexility. The opposite seems to be the > case. I'm not trying to argue the usefulness of this mtime-per-file thing in theory, just providing Derek Fawcus with a suggestion for a viable workaround. What I meant by this offhand comment, and which you may or may not know (and I see no references to it from skimming the thread) is that there's simply no space in the tree objects to add *anything* without breaking the object format and requiring a major upgrade, although the plan to switch to a new hash function is relevant to this. Even if we suppose that git was being implemented today I don't think this would make any sense as a first-level feature. Empirical evidence suggests that people use git on a massive scale largely without caring about this, and the users who do have a workaround. If it were added as a first-level feature to git it would present a lot of UX confusion. E.g. you run "git add" and it'll be showing the mtime somehow, or you get a formatted patch over E-Mail and it doesn't only include the commit time but also times for individual files. The VC systems that had this feature in the past were centralized, so they could (in theory anyway) ensure that timestamps were monotonically increasing. This won't be the case with git, we have plenty of timestamp drift in e.g. linux.git and other git repos. So if these mtimes were used by default they'd
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Re: [PATCH v2 0/3] Re: t7006: add tests for how git config paginates
Thanks. The entire thing looked reasonable. Will replace.
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Re: [PATCH v4 12/13] commit-graph: read only from specific pack-indexes
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stoleewrote: > > Teach git-commit-graph to inspect the objects only in a certain list > of pack-indexes within the given pack directory. This allows updating > the commit graph iteratively, since we add all commits stored in a > previous commit graph. > > Signed-off-by: Derrick Stolee > --- > Documentation/git-commit-graph.txt | 11 +++ > builtin/commit-graph.c | 32 +--- > commit-graph.c | 26 -- > commit-graph.h | 4 +++- > packfile.c | 4 ++-- > packfile.h | 2 ++ > t/t5318-commit-graph.sh| 16 > 7 files changed, 87 insertions(+), 8 deletions(-) > > diff --git a/Documentation/git-commit-graph.txt > b/Documentation/git-commit-graph.txt > index b9b4031..93d50d1 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt > @@ -42,6 +42,10 @@ With the `--delete-expired` option, delete the graph files > in the pack > directory that are not referred to by the graph-latest file. To avoid race > conditions, do not delete the file previously referred to by the > graph-latest file if it is updated by the `--set-latest` option. > ++ > +With the `--stdin-packs` option, generate the new commit graph by > +walking objects only in the specified packfiles and any commits in > +the existing graph-head. A general question on this series: How do commit graph buildups deal with garbage collected commits? (my personal workflow is heavy on rebase, which generates lots of dangling commits, to be thrown out later) The second half of the sentence makes it sound like once a commit is in the graph it cannot be pulled out easily again, hence the question on the impact of graphs on a long living repository which is garbage collected frequently. AFAICT you could just run git commit-graph write --set-latest [--delete-expired] as that actually looks up objects from outside the existing graph files, such that lost objects are ignored? > + const char **lines = NULL; > + int nr_lines = 0; > + int alloc_lines = 0; (nit:) I had the impression that these triplet-variables, that are used in ALLOC_GROW are allo X, X_nr and X_allow, but I might be wrong. > @@ -170,7 +178,25 @@ static int graph_write(int argc, const char **argv) > > old_graph_name = get_graph_latest_contents(opts.obj_dir); > > - graph_name = write_commit_graph(opts.obj_dir); > + if (opts.stdin_packs) { > + struct strbuf buf = STRBUF_INIT; > + nr_lines = 0; > + alloc_lines = 128; alloc_lines has been initialized before, so why redo it here again? Also what is the rationale for choosing 128 as a good default? I would guess 0 is just as fine, because ALLOC_GROW makes sure that it growth fast in the first couple entries by having an additional offset. (no need to fine tune the starting allocation IMHO) > + ALLOC_ARRAY(lines, alloc_lines); > + > + while (strbuf_getline(, stdin) != EOF) { > + ALLOC_GROW(lines, nr_lines + 1, alloc_lines); > + lines[nr_lines++] = buf.buf; > + strbuf_detach(, NULL); strbuf_detach returns its previous buf.buf, such that you can combine these two lines as lines[nr_lines++] = strbuf_detach(, NULL); > + } > + > + pack_indexes = lines; > + nr_packs = nr_lines; Technically we do not need to strbuf_release() here, because strbuf_detach is always called, and by knowing its implementation, it is just as good. > @@ -579,7 +581,27 @@ char *write_commit_graph(const char *obj_dir) > oids.alloc = 1024; > ALLOC_ARRAY(oids.list, oids.alloc); > > - for_each_packed_object(if_packed_commit_add_to_list, , 0); > + if (pack_indexes) { > + struct strbuf packname = STRBUF_INIT; > + int dirlen; > + strbuf_addf(, "%s/pack/", obj_dir); > + dirlen = packname.len; > + for (i = 0; i < nr_packs; i++) { > + struct packed_git *p; > + strbuf_setlen(, dirlen); > + strbuf_addstr(, pack_indexes[i]); > + p = add_packed_git(packname.buf, packname.len, 1); > + if (!p) > + die("error adding pack %s", packname.buf); > + if (open_pack_index(p)) > + die("error opening index for %s", > packname.buf); > + for_each_object_in_pack(p, > if_packed_commit_add_to_list, ); > + close_pack(p); > + } strbuf_release(); > + } > + else (micro style nit) } else
Re: Question about get_cached_commit_buffer()
Jeff Kingwrites: > I think that repeating the oid is intentional; the point is to dump how > the traversal code is hitting the endpoints, even if we do so multiple > times. > > The --oneline behavior just looks like a bug. I think --format is broken > with --show-all, too (it does not show anything!). I do not know about the --format thing, but the part about --oneline being a bug is correct. I've known about the oneline that does not show anything other than the oid (not even end-of-line) for unparsed commits for a long time---I just didn't bother looking into fixing it exactly because this is only a debugging aid ;-) > Though I think it would be equally correct to have set_commit_buffer() > just throw away the existing cache entry and replace it with this one. I > don't think there's a real reason to prefer the old to the new. And that > might be worth doing if it would let us drop get_cached_commit_buffer() > as a public function. But... > ... > In my opinion it's not really worth trying to make it private. The > confusion you're fixing in the first two calls is not due to a bad API, > but due to some subtly confusing logic in that code's use of the API. ;) Yup. > So I'd probably do this: > ... Makes sense to me.
Re: bug in HTTP protocol spec
On Wed, Feb 21, 2018 at 10:29:35AM -0800, Dorian Taylor wrote: > I didn’t get an insight until I ran with GIT_TRACE_PACKET=true on a > known-good remote (i.e. GitHub), that the null packet-line `` has to > follow the service line. This is not reflected in the example here: > > https://github.com/git/git/blob/6464679d9620d91b639e2681b9cc6473f3856d09/Documentation/technical/http-protocol.txt#L216 This is missing the trailing flush after the ref advertisement, too. > It is also not reflected in the BNF: > > https://github.com/git/git/blob/6464679d9620d91b639e2681b9cc6473f3856d09/Documentation/technical/http-protocol.txt#L279 > > (Note these links are from the most recent commit of this file as of this > writing.) > > Just thought somebody would like to know. Thanks, I agree the document is buggy. Do you want to submit a patch? -Peff
Re: Git should preserve modification times at least on request
On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote: > This sounds like a sensible job for a git import tool, i.e. import a > target directory into git, and instead of 'git add'-ing the whole thing > it would look at the mtimes, sort files by mtime, then add them in order > and only commit those files that had the same mtime in the same commit > (or within some boundary). I think that this would be The Wrong Thing to do. The commit time is just that: The time the commit was done. The commit is an atomic group of changes to a number of files that hopefully bring the tree from one usable state into the next. The mtime, in contrast, tells us when a file was most recently modified. It may well be that main.c was most recently modified yesterday, and feature.c was modified this morning, and that only both changes taken together make sense as a commit, despite the long time in between. Even worse, it may be that feature A took a long time to implement, so we have huge gaps in between the mtimes, but feature B was quickly done after A was finished. Such an algorithm would probably split feature A incorrectly into several commits, and group the more recently changed files of feature A with those of feature B. And if Feature A and Feature B were developed in parallel, things get completely messy. > The advantage of doing this via such a tool is that you could tweak it > to commit by any criteria you wanted, e.g. not mtime but ctime or even > atime. Maybe, but it would be rather useless to commit by ctime or atime. You do one grep -r and the atime is different. You do one chmod or chown and the ctime is different. Those timestamps are really only useful for very limited purposes. That ctime exists seems reasonable, since it's only ever updated when the inode is written anyway. atime, in contrast, was clearly one of the rather nonsensical innovations of UNIX: Do one write to the disk for each read from the disk. C'mon, really? It would have been a lot more reasonable to simply provide a generic way for tracing read() system calls instead; then userspace could decide what to do with that information and which of it is useful and should be kept and perhaps stored on disk. Now we have this ugly hack called relatime to deal with the problem. > You'd get the same thing as you'd get if git's tree format would change > to include mtimes (which isn't going to happen), but with a lot more > flexibility. Well, from basic logic, I don't see how a decision not to implement a feature could possibly increase flexility. The opposite seems to be the case. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: [PATCH v3 08/35] connect: discover protocol version outside of get_remote_heads
On Tue, 6 Feb 2018 17:12:45 -0800 Brandon Williamswrote: > - get_remote_heads(fd[0], NULL, 0, , 0, NULL, ); > + > + packet_reader_init(, fd[0], NULL, 0, > +PACKET_READ_CHOMP_NEWLINE | > +PACKET_READ_GENTLE_ON_EOF); > + > + switch (discover_version()) { > + case protocol_v1: > + case protocol_v0: > + get_remote_heads(, , 0, NULL, ); > + break; > + case protocol_unknown_version: > + BUG("unknown protocol version"); > + } This inlining is repeated a few times, which raises the question: if the intention is to keep the v0/1 logic separately from v2, why not have a single function that wraps them all? Looking at the end result (after all the patches in this patch set are applied), it seems that the v2 version does not have extra_have or shallow parameters, which is a good enough reason for me (I don't think functions that take in many arguments and then selectively use them is a good idea). I think that other reviewers will have this question too, so maybe discuss this in the commit message. > diff --git a/remote.h b/remote.h > index 1f6611be2..2016461df 100644 > --- a/remote.h > +++ b/remote.h > @@ -150,10 +150,11 @@ int check_ref_type(const struct ref *ref, int flags); > void free_refs(struct ref *ref); > > struct oid_array; > -extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, > +struct packet_reader; > +extern struct ref **get_remote_heads(struct packet_reader *reader, >struct ref **list, unsigned int flags, >struct oid_array *extra_have, > - struct oid_array *shallow); > + struct oid_array *shallow_points); This change probably does not belong in this patch, especially since remote.c is unchanged.
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On 2/21/2018 2:17 PM, Derrick Stolee wrote: The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. - Second, if the revision walk visits a commit that was marked UNINTERESTING, the walk may never load a buffer and hence rev-list will not output the verbose information. These behaviors are undocumented, untested, and unlikely to be expected by users or other software attempting to parse this output. Helped-by: Jeff KingSigned-off-by: Derrick Stolee This would be a good time to allow multiple authors, or to just change the author, since this is exactly the diff you (Peff) provided in an earlier email. The commit message hopefully summarizes our discussion, but I welcome edits. --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9..d320b6f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d..22b2fb6 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT;
Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store
Stefan Bellerwrites: > + > +/* > + * The mru list_head is supposed to be initialized using > + * the LIST_HEAD macro, assigning prev/next to itself. > + * However this doesn't work in this case as some compilers dislike > + * that macro on member variables. Use NULL instead as that is defined > + * and accepted, deferring the real init to prepare_packed_git_mru(). */ > +#define __MRU_INIT { NULL, NULL } > +#define RAW_OBJECT_STORE_INIT { NULL, NULL, __MRU_INIT, NULL, NULL } I do not think it has to be this way to abuse two NULLs, if you faithfully mimicked how LIST_HEAD() macro is constructed. The reason why it does not try to introduce struct list_head x = LIST_HEAD_INIT; and instead, uses LIST_HEAD(x); is because of the need for self referral. If we learn from it, we can do the same, i.e. instead of doing struct raw_object_store x = RAW_OBJECT_STORE_INIT; we can do RAW_OBJECT_STORE(x); that expands to struct raw_object_store x = { ... { _git_mru, _git_mru }, ... }; no? Then we do not need such a lengthy comment that reads only like an excuse ;-)
Re: [PATCH 04/26] upload-pack: convert to a builtin
Brandon Williams wrote: > On 01/03, Stefan Beller wrote: > > On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williamswrote: >>> In order to allow for code sharing with the server-side of fetch in >>> protocol-v2 convert upload-pack to be a builtin. >> >> What is the security aspect of this patch? >> >> By making upload-pack builtin, it gains additional abilities, >> such as answers to '-h' or '--help' (which would start a pager). >> Is there an easy way to sooth my concerns? (best put into the >> commit message) > > receive-pack is already a builtin, so theres that. *nod* Since v2.4.12~1^2 (shell: disallow repo names beginning with dash, 2017-04-29), git-shell refuses to pass --help to upload-pack, limiting the security impact in configurations that use git-shell (e.g. gitolite installations). If you're not using git-shell, then hopefully you have some other form of filtering preventing arbitrary options being passed to git-upload-pack. If you don't, then you're in trouble, for the reasons described in that commit. Since some installations may be allowing access to git-upload-pack (for read-only access) without allowing access to git-receive-pack, this does increase the chance of attack. On the other hand, I suspect the maintainability benefit is worth it. For defense in depth, it would be comforting if the git wrapper had some understanding of "don't support --help in handle_builtin when invoked as a dashed command". That is, I don't expect that anyone has been relying on git-add --help acting like git help add instead of printing the usage message from git add -h It's a little fussy because today we rewrite "git add --help" to "git-add --help" before rewriting it to "git help add"; we'd have to skip that middle hop for this to work. I don't think that has to block this patch or series, though --- it's just a separate thought about hardening. Cc-ing Sitaram Chamarty since he tends to be wiser about this kind of thing than I am. What do you think? Thanks, Jonathan
Re: [PATCH v3 04/35] upload-pack: convert to a builtin
On Tue, 6 Feb 2018 17:12:41 -0800 Brandon Williamswrote: > In order to allow for code sharing with the server-side of fetch in > protocol-v2 convert upload-pack to be a builtin. > > Signed-off-by: Brandon Williams As Stefan mentioned in [1], also mention in the commit message that this means that the "git-upload-pack" invocation gains additional capabilities (for example, invoking a pager for --help). Having said that, the main purpose of this patch seems to be to libify upload-pack, and the move to builtin is just a way of putting the program somewhere - we could have easily renamed upload-pack.c and created a new upload-pack.c containing the main(), preserving the non-builtin-ness of upload-pack, while still gaining the benefits of libifying upload-pack. If the community does want to make upload-pack a builtin, I would write the commit message this way: upload-pack: libify Libify upload-pack. The main() function is moved to builtin/upload-pack.c, thus making upload-pack a builtin. Note that this means that "git-upload-pack" gains functionality such as the ability to invoke a pager when passed "--help". And if not: upload-pack: libify Libify upload-pack by moving most of the functionality in upload-pack.c into a file upload-pack-lib.c (or some other name), to be used in subsequent patches. [1] https://public-inbox.org/git/CAGZ79kb2=uu0_k8wr27gndnx-t+p+7gvdgc5ebdyc3zbobs...@mail.gmail.com/ > -static void upload_pack(void) > -{ > - struct string_list symref = STRING_LIST_INIT_DUP; > - > - head_ref_namespaced(find_symref, ); > - > - if (advertise_refs || !stateless_rpc) { > - reset_timeout(); > - head_ref_namespaced(send_ref, ); > - for_each_namespaced_ref(send_ref, ); > - advertise_shallow_grafts(1); > - packet_flush(1); > - } else { > - head_ref_namespaced(check_ref, NULL); > - for_each_namespaced_ref(check_ref, NULL); > - } > - string_list_clear(, 1); > - if (advertise_refs) > - return; > - > - receive_needs(); > - if (want_obj.nr) { > - get_common_commits(); > - create_pack_file(); > - } > -} I see that this function had to be moved to the bottom because it now also needs to make use of functions like upload_pack_config() - that's fine. > +struct upload_pack_options { > + int stateless_rpc; > + int advertise_refs; > + unsigned int timeout; > + int daemon_mode; > +}; I would have expected "unsigned stateless_rpc : 1" etc., but I see that this makes it easier to use with OPT_BOOL (which needs us to pass it a pointer-to-int). As for what existing code does, files like fetch-pack and diff use "unsigned : 1", but they also process arguments without OPT_, so I don't think they are relevant. I think that we should decide if we're going to prefer "unsigned : 1" or "int" for flags in new code. Personally, I prefer "unsigned : 1" (despite the slight inconvenience in that argument parsers will need to declare their own temporary "int" and then assign its contents to the options struct) because of the stronger type, but I'm OK either way. Whatever the decision, I don't think it needs to block the review of this patch set.
Congratulations i willed to you
My Dear friend, I am Mr. Micheal Pascal Anderson I got your contact on my personal search of the person.. I willed the only funds left in my account to you. If you want to know why I have willed the only funds left in my account to you please contact the bank manager whose name and address I will give to you as soon as you reply to this mail. Right now I am in the hospital. Please, Mr. Micheal Pascal Anderson
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stoleewrote: > graph_name = write_commit_graph(opts.obj_dir); > > if (graph_name) { > if (opts.set_latest) > set_latest_file(opts.obj_dir, graph_name); > > + if (opts.delete_expired) > + do_delete_expired(opts.obj_dir, > + old_graph_name, > + graph_name); > + So this only allows to delete expired things and setting the latest when writing a new graph. Would we ever envision a user to produce a new graph (e.g. via obtaining a graph that they got from a server) and then manually rerouting the latest to that new graph file without writing that graph file in the same process? The same for expired. I guess these operations are just available via editing the latest or deleting files manually, which slightly contradicts e.g. "git update-ref", which in olden times was just a fancy way of rewriting the refs file manually. (though it claims to be less prone to errors as it takes lock files) > > extern char *get_graph_latest_filename(const char *obj_dir); > +extern char *get_graph_latest_contents(const char *obj_dir); Did https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/ ever make it into tree? (It is sort of new, but I feel we'd want to strive for consistency in the code base, eventually.)
Re: Git should preserve modification times at least on request
On Wed, Feb 21 2018, Derek Fawcus jotted: > On Mon, Feb 19, 2018 at 10:22:36PM +0100, Peter Backes wrote: >> >> It is pretty annoying that git cannot, even if I know what I am doing, >> and explicitly want it to, preserve the modification time. > > The use case I've come across where it would be of value is for code > archeology, either importing a bunch of tar files, or importing a > repo from some other VCS. > > There preserving the mod times can be useful when one is subsequently > figuring out what changed, and the scope of the 'commits' is too big > (i.e. the granularity of the tar files themselves). > > e.g. initial commits are done on tar boundaries, but one may try to > figure out individual changes from a ChangeLog file. I've done this > a couple of times, but to date it has required keeping the untarred > trees around (or a timestamp list file from each tree), in addition > to the git repro in to which one is then synthesizing smaller commits. This sounds like a sensible job for a git import tool, i.e. import a target directory into git, and instead of 'git add'-ing the whole thing it would look at the mtimes, sort files by mtime, then add them in order and only commit those files that had the same mtime in the same commit (or within some boundary). The advantage of doing this via such a tool is that you could tweak it to commit by any criteria you wanted, e.g. not mtime but ctime or even atime. You'd get the same thing as you'd get if git's tree format would change to include mtimes (which isn't going to happen), but with a lot more flexibility.
Re: Git should preserve modification times at least on request
On 20/02/18 22:48, Peter Backes wrote: On Tue, Feb 20, 2018 at 11:32:23PM +0100, Johannes Schindelin wrote: Hi Peter, On Tue, 20 Feb 2018, Peter Backes wrote: On Tue, Feb 20, 2018 at 11:46:38AM +0100, Johannes Schindelin wrote: I would probably invent a file format (``) I'm stuck there because of being munged. From which command do you want to get it? If you are looking at `git diff`, you may want to use the `-z --name-only` options to avoid munging the paths. I plan to use "git diff-tree --name-only $w_tree HEAD" and subtract all lines from "git diff-index --name-only HEAD" to get the files for which the timestamp should be stored.. If I use "-z" I get the non-munged path, but I cannot safely store such paths in the proposed file format; they might contain newlines (sigh). So at one point I have to munge. Then the same question arises when I have to get the actual path from the munged path when restoring the timestamps. If there's no ready-made functionality to munge and unmunge paths, I have to write some awk for this. At first I thought this might add one more dependency to git, but it seems that awk is already used in git-mergetool.sh, so I suppose it's okay to use in git-stash.sh etc, too. In recent versions of git there's unquote_path() in Git.pm, you could possibly use that with perl -e from your script Best Wishes Phillip Best wishes Peter
Re: Git should preserve modification times at least on request
On Mon, Feb 19, 2018 at 10:22:36PM +0100, Peter Backes wrote: > > It is pretty annoying that git cannot, even if I know what I am doing, > and explicitly want it to, preserve the modification time. The use case I've come across where it would be of value is for code archeology, either importing a bunch of tar files, or importing a repo from some other VCS. There preserving the mod times can be useful when one is subsequently figuring out what changed, and the scope of the 'commits' is too big (i.e. the granularity of the tar files themselves). e.g. initial commits are done on tar boundaries, but one may try to figure out individual changes from a ChangeLog file. I've done this a couple of times, but to date it has required keeping the untarred trees around (or a timestamp list file from each tree), in addition to the git repro in to which one is then synthesizing smaller commits. DF
Re: [PATCH] t/known-leaky: add list of known-leaky test scripts
On Wed, Feb 21, 2018 at 08:53:16AM -0800, Junio C Hamano wrote: > Even though keeping track of list of known-leaky tests may not be so > interesting, we can still salvage useful pieces from the discussion > and make them available to developers, e.g. something like > > prove --dry --state=failed | > perl -lne '/^(t[0-9]{4})-.*\.sh$/ and print $1' | sort >$@+ > if cmp >/dev/null $@ $@+; then rm $@+; else mv $@+ $@; fi > > could be made into a target to stash away the list of failing tests > after a test run? Unfortunately there are some caveats in that snippet: 1. You are using prove. 2. You are using --state=save in the initial run. I think we might be better off having the test scripts write to test-results/*.counts even when run under a TAP harness, and then we can have a consistent way to get the list of failed tests (we already have a "make failed" that works this way). -Peff
[PATCH v2 3/3] config: change default of `pager.config` to "on"
This is similar to ff1e72483 (tag: change default of `pager.tag` to "on", 2017-08-02) and is safe now that we do not consider `pager.config` at all when we are not listing or getting configuration. This change will help with listing large configurations, but will not hurt users of `git config --edit` as it would have before the previous commit. Signed-off-by: Martin Ågren--- Documentation/git-config.txt | 1 + t/t7006-pager.sh | 12 ++-- builtin/config.c | 2 +- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt index 249090ac84..e09ed5d7d5 100644 --- a/Documentation/git-config.txt +++ b/Documentation/git-config.txt @@ -237,6 +237,7 @@ CONFIGURATION - `pager.config` is only respected when listing configuration, i.e., when using `--list` or any of the `--get-*` which may return multiple results. +The default is to use a pager. [[FILES]] FILES diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh index 95bd26f0b2..7541ba5edb 100755 --- a/t/t7006-pager.sh +++ b/t/t7006-pager.sh @@ -267,23 +267,23 @@ test_expect_success TTY 'git config --get ignores pager.config' ' ! test -e paginated.out ' -test_expect_success TTY 'git config --get-urlmatch defaults to not paging' ' +test_expect_success TTY 'git config --get-urlmatch defaults to paging' ' rm -f paginated.out && test_terminal git -c http."https://foo.com/".bar=foo \ config --get-urlmatch http https://foo.com && - ! test -e paginated.out + test -e paginated.out ' test_expect_success TTY 'git config --get-all respects pager.config' ' rm -f paginated.out && - test_terminal git -c pager.config config --get-all foo.bar && - test -e paginated.out + test_terminal git -c pager.config=false config --get-all foo.bar && + ! test -e paginated.out ' -test_expect_success TTY 'git config --list defaults to not paging' ' +test_expect_success TTY 'git config --list defaults to paging' ' rm -f paginated.out && test_terminal git config --list && - ! test -e paginated.out + test -e paginated.out ' diff --git a/builtin/config.c b/builtin/config.c index a732d9b56c..01169dd628 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -602,7 +602,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) } if (actions & PAGING_ACTIONS) - setup_auto_pager("config", 0); + setup_auto_pager("config", 1); if (actions == ACTION_LIST) { check_argc(argc, 0, 0); -- 2.16.2.246.ga4ee8f
Re: Question about get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 09:13:22AM -0500, Derrick Stolee wrote: > > So there it is. It does show commits multiple times, but suppresses the > > verbose header after the first showing. If we do something like this: > > > >git rev-list --show-all --pretty --boundary c93150cfb0^- > > > > you'll see some boundary commits that _don't_ have their pretty headers > > shown. And with your proposed patch, we'd show them again. To keep the > > same behavior we need to store that "we've already seen this" boolean > > somewhere else (e.g., in an object flag; possibly SEEN, but that might > > run afoul of other logic). > > What confuses me about this behavior is that the OID is still shown on the > repeat (and in the case of `git log --oneline` will not actually have a line > break between two short-OIDs). I don't believe this behavior is something to > preserve. I think that repeating the oid is intentional; the point is to dump how the traversal code is hitting the endpoints, even if we do so multiple times. The --oneline behavior just looks like a bug. I think --format is broken with --show-all, too (it does not show anything!). > Unless I am misunderstanding, the current behavior on a repeated commit is > already incorrect: some amount of output occurs before checking the buffer, > so the output includes repeated records but with formatting that violates > the expectation. By doing the simple change of swapping > get_cached_commit_buffer() with get_commit_buffer(), we correct that format > violation but have duplicate copies. Yeah, I'd agree with that assessment. > The most-correct thing to do (in my opinion) is to put the requirement of > "no repeats" into the revision walk logic and stop having the formatting > methods expect them. Then, however we change this boolean setting of "we > have seen this before" it will not require the formatting methods to change. But then you wouldn't show repeats at all. If I'm understanding you correctly. TBH, I do not think it is worth spending a lot of effort on this --show-all feature. It seems mostly like forgotten debugging cruft to me. That's why I'd be OK with showing the whole header as the simplest fix (i.e., just removing those calls entirely, not even converting them to get_commit_buffer). > I can start working on a patch to move the duplicate-removal logic into > revision.c instead of these three callers: > > builtin/rev-list.c: if (revs->verbose_header && > get_cached_commit_buffer(commit, NULL)) { > log-tree.c: if (!get_cached_commit_buffer(commit, NULL)) > object.c: if (!get_cached_commit_buffer(commit, NULL)) Those first two are duplicate detection. The third one in object.c should stay, though. We've been fed a commit buffer to parse, and we want to know whether we should attach it as the cached buffer for that commit. But if we already have a cached buffer, there's no point in doing so. And that's what we're checking there. Though I think it would be equally correct to have set_commit_buffer() just throw away the existing cache entry and replace it with this one. I don't think there's a real reason to prefer the old to the new. And that might be worth doing if it would let us drop get_cached_commit_buffer() as a public function. But... > But this caller seems pretty important in pretty.c: > > /* > * Otherwise, we still want to munge the encoding header in the > * result, which will be done by modifying the buffer. If we > * are using a fresh copy, we can reuse it. But if we are using > * the cached copy from get_commit_buffer, we need to duplicate it > * to avoid munging the cached copy. > */ > if (msg == get_cached_commit_buffer(commit, NULL)) > out = xstrdup(msg); > else > out = (char *)msg Like the one in object.c, this really does want to know about the cached entry. And it should be unaffected by your patch, since we will have called get_commit_buffer() at the top of that function. If we wanted to write this one without get_cached_commit_buffer(), we'd really need a function to ask "did this pointer come from the cache, or was it freshly allocated?". That's the same thing we do for unuse_commit_buffer(). So in theory we could have a boolean function that would check that, and that would let us make get_cached_commit_buffer() private. In my opinion it's not really worth trying to make it private. The confusion you're fixing in the first two calls is not due to a bad API, but due to some subtly confusing logic in that code's use of the API. ;) So I'd probably do this: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index d94062bc84..3af56921c8 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -150,7 +150,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit,
Re: Question about get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 01:48:11PM -0500, Jeff King wrote: > > What confuses me about this behavior is that the OID is still shown on the > > repeat (and in the case of `git log --oneline` will not actually have a line > > break between two short-OIDs). I don't believe this behavior is something to > > preserve. > > I think that repeating the oid is intentional; the point is to dump how > the traversal code is hitting the endpoints, even if we do so multiple > times. > > The --oneline behavior just looks like a bug. I think --format is broken > with --show-all, too (it does not show anything!). I poked at one of the examples a little more closely. I actually think these are not repeats, but simply UNINTERESTING parents that we never needed to look at in our traversal (because we hit a point where everything was UNINTERESTING). So we are relying not on finish_commit() to have freed the buffer, but on the traversal code to have never parsed those commits in the first place. Which is doubly subtle. I think the rest of my email stands, though: we should just show the full headers for those commits. -Peff
Re: [PATCH v4 05/13] commit-graph: implement 'git-commit-graph write'
Derrick Stoleewrites: > +static int graph_write(int argc, const char **argv) > +{ > + ... > + graph_name = write_commit_graph(opts.obj_dir); > + > + if (graph_name) { > + printf("%s\n", graph_name); > + FREE_AND_NULL(graph_name); > + } > + > + return 0; > +} After successfully writing a graph file out, write_commit_graph() signals that fact by returning a non-NULL pointer, so that this caller can report the filename to the end user. This caller protects itself from a NULL return, presumably because the callee uses it to signal an error when writing the graph file out? Is it OK to lose that 1-bit of information, or should we have more like if (graph_name) { printf; return 0; } else { return -1; } > int cmd_commit_graph(int argc, const char **argv, const char *prefix) > { > static struct option builtin_commit_graph_options[] = { > - { OPTION_STRING, 'p', "object-dir", _dir, > + { OPTION_STRING, 'o', "object-dir", _dir, > N_("dir"), > N_("The object directory to store the graph") }, > OPT_END(), The same comment for a no-op patch from an earlier step applies here, and we have another one that we saw above in graph_write(). > @@ -31,6 +67,11 @@ int cmd_commit_graph(int argc, const char **argv, const > char *prefix) >builtin_commit_graph_usage, >PARSE_OPT_STOP_AT_NON_OPTION); > > + if (argc > 0) { > + if (!strcmp(argv[0], "write")) > + return graph_write(argc, argv); And if we fix "graph_write" to report an error with negative return, this needs to become something like return !!graph_write(argc, argv); as we do not want to return a negative value to be passed via run_builtin() to exit(3) in handle_builtin(). > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > new file mode 100755 > index 000..6a5e93c > --- /dev/null > +++ b/t/t5318-commit-graph.sh > @@ -0,0 +1,119 @@ > +#!/bin/sh > + > +test_description='commit graph' > +. ./test-lib.sh > + > +test_expect_success 'setup full repo' ' > + rm -rf .git && I am perfectly OK with creating a separate subdirectory called 'full' in the trash directory given by the test framework, but unless absolutely necessary I'd rather not to see "rm -rf", especially on ".git", in our test scripts. People can screw up doing various things (like copying and pasting). > + mkdir full && > + cd full && > + git init && > + objdir=".git/objects" > +' And I absolutely do not want to see "cd full" that leaves and stays in the subdirectory after this step is done. Imagine what happens if any earlier step fails before doing "cd full", causing this "setup full" step to report failure, and then the test goes on to the next step? We will not be in "full" and worse yet because we do not have "$TRASH_DIRECTORY/.git" (you removed it), the "git commit-graph write --object-dir" command we end up doing next will see the git source repository as the repository it is working on. Never risk trashing our source repository with your test. That is why we give you $TRASH_DIRECTORY to play in. Make use of it when you can. I'd make this step just a single git init full and then the next one git -C full commit-graph write --object-dir . In later tests that have multi-step things, I'd instead make them ( cd full && ... whatever you do && ... in that separate && ... 'full' repository ) if I were writing this test *and* if I really wanted to do things inside $TRASH_DIRECTORY/full/.git repository. I am not convinced yet about the latter. I know that it will make certain things simpler to use a separate /full hierarchy (e.g. cleaning up, having another unrelated test repository, etc.) while making other things more cumbersome (e.g. you need to be careful when you "cd" and the easiest way to do so is to ( do things in a subshell )). I just do not know what the trade-off would look like in this particular case. A simple rule of thumb I try to follow is not to change $(pwd) for the process that runs these test_expect_success shell functions. > + > +test_expect_success 'write graph with no packs' ' > + git commit-graph write --object-dir . > +' > + > +test_expect_success 'create commits and repack' ' > + for i in $(test_seq 3) > + do > + test_commit $i && > + git branch commits/$i > + done && > + git repack > +'
Re: [PATCH v4 02/13] graph: add commit graph design document
> +[3] > https://public-inbox.org/git/20170907094718.b6kuzp2uhvkmw...@sigill.intra.peff.net/t/#m7a2ea7b355aeda962e6b86404bcbadc648abfbba > +More discussion about generation numbers and not storing them inside > +commit objects. A valuable quote: Unlike the other public inbox links this links to a discussion with all messages on one page, https://public-inbox.org/git/20170908034739.4op3w4f2ma5s6...@sigill.intra.peff.net/ would have this be more inline with the other links. (this is a super small nit, which I am not sure if we care about at all; the rest of the doc is awesome!)
bug in HTTP protocol spec
Hello list, I had been banging my head all morning trying to figure out why I couldn’t get a little HTTP implementation to clone/push via the smart protocol (just wrapping git-receive-pack/git-upload-pack). I kept getting the following (likely familiar to some) error: ``` fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. ``` I didn’t get an insight until I ran with GIT_TRACE_PACKET=true on a known-good remote (i.e. GitHub), that the null packet-line `` has to follow the service line. This is not reflected in the example here: https://github.com/git/git/blob/6464679d9620d91b639e2681b9cc6473f3856d09/Documentation/technical/http-protocol.txt#L216 It is also not reflected in the BNF: https://github.com/git/git/blob/6464679d9620d91b639e2681b9cc6473f3856d09/Documentation/technical/http-protocol.txt#L279 (Note these links are from the most recent commit of this file as of this writing.) Just thought somebody would like to know. Regards, -- Dorian Taylor Make things. Make sense. https://doriantaylor.com
Re: [PATCH v4 01/13] commit-graph: add format document
>> >> [ so in small repos, where there are fewer than 256 objects, >> F[i] == F[i+1], for all i'th where there is no object starting with i >> byte] > > > Correct. I'm not sure this additional information is valuable for the > document, though. It is not, I was just making sure I'd understand correctly.
Re: [PATCH] submodule: indicate that 'submodule.recurse' doesn't apply to clone
Brandon Williamswrites: > Update the documentation for the 'submodule.recurse' config to identify > that the clone command does not respect it. > > Signed-off-by: Brandon Williams > --- > Documentation/config.txt | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Thanks. > diff --git a/Documentation/config.txt b/Documentation/config.txt > index f57e9cf10..59ff29d7a 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -3210,7 +3210,8 @@ submodule.active:: > > submodule.recurse:: > Specifies if commands recurse into submodules by default. This > - applies to all commands that have a `--recurse-submodules` option. > + applies to all commands that have a `--recurse-submodules` option, > + except `clone`. > Defaults to false. > > submodule.fetchJobs::
Re: [PATCH v4 06/13] commit-graph: implement git commit-graph read
Derrick Stoleewrites: > +'read':: > + > +Read a graph file given by the graph-head file and output basic > +details about the graph file. > ++ > +With `--file=` option, consider the graph stored in the file at > +the path /info/. > + A sample reader confusion after reading the above twice: What is "the graph-head file" and how does the user specify it? Is it given by the value for the "--file=" command line option? Another sample reader reaction after reading the above: What are the kind of "basic details" we can learn from this command is unclear, but perhaps there is an example to help me decide if this command is worth studying. > @@ -44,6 +53,12 @@ EXAMPLES > $ git commit-graph write > > > +* Read basic information from a graph file. > ++ > + > +$ git commit-graph read --file= > + > + And the sample reader is utterly disappointed at this point. > +static int graph_read(int argc, const char **argv) > +{ > + struct commit_graph *graph = 0; > + struct strbuf full_path = STRBUF_INIT; > + > + static struct option builtin_commit_graph_read_options[] = { > + { OPTION_STRING, 'o', "object-dir", _dir, > + N_("dir"), > + N_("The object directory to store the graph") }, > + { OPTION_STRING, 'H', "file", _file, > + N_("file"), > + N_("The filename for a specific commit graph file in > the object directory."), > + PARSE_OPT_OPTARG, NULL, (intptr_t) "" }, > + OPT_END(), > + }; The same comment as all the previous ones apply, wrt short options and non-use of OPT_STRING(). Also, I suspect that these two would want to use OPT_FILENAME instead, if we anticipate that the command might want to be sometimes run from a subdirectory. Otherwise wouldn't cd t && git commit-graph read --file=../.git/object/info/$whatever end up referring to a wrong place because the code that uses the value obtained from OPTION_STRING does not do the equivalent of parse-options.c::fix_filename()? The same applies to object-dir handling. > + argc = parse_options(argc, argv, NULL, > + builtin_commit_graph_read_options, > + builtin_commit_graph_read_usage, 0); > + > + if (!opts.obj_dir) > + opts.obj_dir = get_object_directory(); > + > + if (!opts.graph_file) > + die("no graph hash specified"); > + > + strbuf_addf(_path, "%s/info/%s", opts.obj_dir, opts.graph_file); Ahh, I was fooled by a misnamed option. --file does *not* name the file. It is a filename in a fixed place that is determined by other things. So it would be a mistake to use OPT_FILENAME() in the parser for that misnamed "--file" option. The parser for --object-dir still would want to be OPT_FILENAME(), but quite honestly, I do not see the point of having --object-dir option in the first place. The graph file is not relative to it but is forced to have /info/ in between that directory and the filename, so it is not like the user gets useful flexibility out of being able to specify two different places using --object-dir= option and $GIT_OBJECT_DIRECTORY environment (iow, a caller that wants to work on a specific object directory can use the environment, which is how it would tell any other Git subcommand which object store it wants to work with). But stepping back a bit, I think the way --file argument is defined is halfway off from two possible more useful ways to define it. If it were just "path to the file" (iow, what OPT_FILENAME() is suited for parsing it), then a user could say "I have this graph file that I created for testing, it is not installed in its usual place in $GIT_OBJECT_DIRECTORY/info/ at all, but I want you to read it because I am debugging". That is one possible useful extreme. The other possibility would be to allow *only* the hash part to be specified, iow, not just forcing /info/ relative to object directory, you would force the "graph-" prefix and ".graph" suffix. That would be the other extreme that is useful (less typing and less error prone). For a low-level command line this, my gut feeling is that it would be better to allow paths to the object directory and the graph file to be totally independently specified. > + if (graph_signature != GRAPH_SIGNATURE) { > + munmap(graph_map, graph_size); > + close(fd); > + die("graph signature %X does not match signature %X", > + graph_signature, GRAPH_SIGNATURE); > + } > + > + graph_version = *(unsigned char*)(data + 4); > + if (graph_version != GRAPH_VERSION) { > + munmap(graph_map, graph_size); > + close(fd); > + die("graph version %X does not match
[PATCH v2 1/3] t7006: add tests for how git config paginates
The next couple of commits will change how `git config` handles `pager.config`, similar to how de121ffe5 (tag: respect `pager.tag` in list-mode only, 2017-08-02) and ff1e72483 (tag: change default of `pager.tag` to "on", 2017-08-02) changed `git tag`. Similar work has also been done to `git branch`. Add tests in this area to make sure that we don't regress and so that the upcoming commits can be made clearer by adapting the tests. Add tests for simple config-setting, `--edit`, `--get`, `--get-urlmatch`, `get-all`, and `--list`. Those represent a fair portion of the various options that will be affected by the next two commits. Use `test_expect_failure` to document that we currently respect the pager-configuration with `--edit`. The current behavior is buggy since the pager interferes with the editor and makes the end result completely broken. See also b3ee740c8 (t7006: add tests for how git tag paginates, 2017-08-02). The next commit will teach simple config-setting and `--get` to ignore `pager.config`. Test the current behavior as "success", not "failure", since the currently expected behavior according to documentation would be to page. The next commit will change that expectation by updating the documentation on `git config` and will redefine those successful tests. Remove the test added in commit 3ba7e6e29a (config: run setup_git_directory_gently() sooner, 2010-08-05) since it has some overlap with these. We could leave it or tweak it, or place new tests like these next to it, but let's instead make the tests for `git config` as similar as possible to the ones for `git tag` and `git branch`, and place them after those. Signed-off-by: Martin Ågren--- t/t7006-pager.sh | 49 ++--- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh index f5f46a95b4..a46a079339 100755 --- a/t/t7006-pager.sh +++ b/t/t7006-pager.sh @@ -110,13 +110,6 @@ test_expect_success TTY 'configuration can disable pager' ' ! test -e paginated.out ' -test_expect_success TTY 'git config uses a pager if configured to' ' - rm -f paginated.out && - test_config pager.config true && - test_terminal git config --list && - test -e paginated.out -' - test_expect_success TTY 'configuration can enable pager (from subdir)' ' rm -f paginated.out && mkdir -p subdir && @@ -252,6 +245,48 @@ test_expect_success TTY 'git branch --set-upstream-to ignores pager.branch' ' ! test -e paginated.out ' +test_expect_success TTY 'git config respects pager.config when setting' ' + rm -f paginated.out && + test_terminal git -c pager.config config foo.bar bar && + test -e paginated.out +' + +test_expect_failure TTY 'git config --edit ignores pager.config' ' + rm -f paginated.out editor.used && + write_script editor <<-\EOF && + touch editor.used + EOF + EDITOR=./editor test_terminal git -c pager.config config --edit && + ! test -e paginated.out && + test -e editor.used +' + +test_expect_success TTY 'git config --get respects pager.config' ' + rm -f paginated.out && + test_terminal git -c pager.config config --get foo.bar && + test -e paginated.out +' + +test_expect_success TTY 'git config --get-urlmatch defaults to not paging' ' + rm -f paginated.out && + test_terminal git -c http."https://foo.com/".bar=foo \ + config --get-urlmatch http https://foo.com && + ! test -e paginated.out +' + +test_expect_success TTY 'git config --get-all respects pager.config' ' + rm -f paginated.out && + test_terminal git -c pager.config config --get-all foo.bar && + test -e paginated.out +' + +test_expect_success TTY 'git config --list defaults to not paging' ' + rm -f paginated.out && + test_terminal git config --list && + ! test -e paginated.out +' + + # A colored commit log will begin with an appropriate ANSI escape # for the first color; the text "commit" comes later. colorful() { -- 2.16.2.246.ga4ee8f
Re: [PATCH v4 01/13] commit-graph: add format document
On 2/21/2018 2:23 PM, Stefan Beller wrote: On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stoleewrote: +In order to allow extensions that add extra data to the graph, we organize +the body into "chunks" and provide a binary lookup table at the beginning +of the body. The header includes certain values, such as number of chunks, +hash lengths and types. + +All 4-byte numbers are in network order. + +HEADER: + + 4-byte signature: + The signature is: {'C', 'G', 'P', 'H'} + + 1-byte version number: + Currently, the only valid version is 1. + + 1-byte Object Id Version (1 = SHA-1) + + 1-byte number (C) of "chunks" + + 1-byte (reserved for later use) What should clients of today do with it? * ignore it completely [as they have no idea what it is] or * throw hands up in the air if it is anything other than 0 ? [because clearly we will increment the version or have new information in a new chunk instead of just sneaking in information here?] They should ignore it completely, which will allow using the value for something meaningful later without causing a version change (which we DO die() for). A user could downgrade from a version that uses this byte for something meaningful and not require a new commit-graph file. The "commit-graph read" subcommand does output this byte, so we can verify that the "write" subcommand places a 0 in this position. +CHUNK LOOKUP: + + (C + 1) * 12 bytes listing the table of contents for the chunks: + First 4 bytes describe chunk id. Value 0 is a terminating label. + Other 8 bytes provide offset in current file for chunk to start. offset [in bytes? I could imagine having a larger granularity here, because chunks don't sound small.] It is good to specify "offset in bytes". + (Chunks are ordered contiguously in the file, so you can infer + the length using the next chunk position if necessary.) + + The remaining data in the body is described one chunk at a time, and + these chunks may be given in any order. Chunks are required unless + otherwise specified. + +CHUNK DATA: + + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) + The ith entry, F[i], stores the number of OIDs with first + byte at most i. Thus F[255] stores the total + number of commits (N). [ so in small repos, where there are fewer than 256 objects, F[i] == F[i+1], for all i'th where there is no object starting with i byte] Correct. I'm not sure this additional information is valuable for the document, though. + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) + The OIDs for all commits in the graph, sorted in ascending order. + + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) +* The first H bytes are for the OID of the root tree. +* The next 8 bytes are for the int-ids of the first two parents + of the ith commit. Stores value 0x if no parent in that + position. If there are more than two parents, the second value + has its most-significant bit on and the other bits store an array + position into the Large Edge List chunk. +* The next 8 bytes store the generation number of the commit and + the commit time in seconds since EPOCH. The generation number + uses the higher 30 bits of the first 4 bytes, while the commit + time uses the 32 bits of the second 4 bytes, along with the lowest + 2 bits of the lowest byte, storing the 33rd and 34th bit of the + commit time. + + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] + This list of 4-byte values store the second through nth parents for + all octopus merges. The second parent value in the commit data stores + an array position within this list along with the most-significant bit + on. Starting at that array position, iterate through this list of int-ids + for the parents until reaching a value with the most-significant bit on. + The other bits correspond to the int-id of the last parent. + +TRAILER: + + H-byte HASH-checksum of all of the above. + -- 2.7.4 Makes sense so far, I'll read on. I agree with Junio, that I could read this documentation without the urge to point out nits. :) Thanks, Stefan
[PATCH v2 0/3] Re: t7006: add tests for how git config paginates
This is v2 of my series to teach `git config` to only respect `pager.config` when listing configuration, then changing the default to "on". Thanks to Duy and Junio for feedback on the first version. Based on Duy's feeback, I've changed the approach to more carefully divide the various getters into "may produce multiple lines, so let's page" vs "may not, so don't". Junio hesitated whether we should add tests using `test_expect_success`, then flip the test-definition, or whether we should start with a "failure" that we then flip to "success". I have not done anything about that, except to try and motivate the choice better in the commit message of the second patch. Martin Martin Ågren (3): t7006: add tests for how git config paginates config: respect `pager.config` in list/get-mode only config: change default of `pager.config` to "on" Documentation/git-config.txt | 6 ++ t/t7006-pager.sh | 49 +--- builtin/config.c | 10 + git.c| 2 +- 4 files changed, 59 insertions(+), 8 deletions(-) -- 2.16.2.246.ga4ee8f
Re: [PATCH v3 00/35] protocol version 2
On 02/06, Brandon Williams wrote: > Changes in v3: > * There were some comments about how the protocol should be designed >stateless first. I've made this change and instead of having to >supply the `stateless-rpc=true` capability to force stateless >behavior, the protocol just requires all commands to be stateless. > > * Added some patches towards the end of the series to force the client >to not request to use protocol v2 when pushing (even if configured to >use v2). This is to ease the roll-out process of a push command in >protocol v2. This way when servers gain the ability to accept >pushing in v2 (and they start responding using v2 when requests are >sent to the git-receive-pack endpoint) that clients who still don't >understand how to push using v2 won't request to use v2 and then die >when they recognize that the server does indeed know how to accept a >push under v2. > > * I implemented the `shallow` feature for fetch. This feature >encapsulates the existing functionality of all the shallow/deepen >capabilities in v0. So now a server can process shallow requests. > > * Various other small tweaks that I can't remember :) > > After all of that I think the series is in a pretty good state, baring > any more critical reviewing feedback. > > Thanks! I'm hoping to get some more in depth review before I do any more re-rolls, but for those interested I will need to do a re-roll to eliminate the prelude from the http transport. This is the prelude which includes the service line followed by any number of packet lines culminating in a flush-pkt like so: # service=git-upload-pack some other optional lines With this eliminated all transports will be exactly the same, the only difference will be how the protocol is tunneled. > > Brandon Williams (35): > pkt-line: introduce packet_read_with_status > pkt-line: introduce struct packet_reader > pkt-line: add delim packet support > upload-pack: convert to a builtin > upload-pack: factor out processing lines > transport: use get_refs_via_connect to get refs > connect: convert get_remote_heads to use struct packet_reader > connect: discover protocol version outside of get_remote_heads > transport: store protocol version > protocol: introduce enum protocol_version value protocol_v2 > test-pkt-line: introduce a packet-line test helper > serve: introduce git-serve > ls-refs: introduce ls-refs server command > connect: request remote refs using v2 > transport: convert get_refs_list to take a list of ref patterns > transport: convert transport_get_remote_refs to take a list of ref > patterns > ls-remote: pass ref patterns when requesting a remote's refs > fetch: pass ref patterns when fetching > push: pass ref patterns when pushing > upload-pack: introduce fetch server command > fetch-pack: perform a fetch using v2 > upload-pack: support shallow requests > fetch-pack: support shallow requests > connect: refactor git_connect to only get the protocol version once > connect: don't request v2 when pushing > transport-helper: remove name parameter > transport-helper: refactor process_connect_service > transport-helper: introduce stateless-connect > pkt-line: add packet_buf_write_len function > remote-curl: create copy of the service name > remote-curl: store the protocol version the server responded with > http: allow providing extra headers for http requests > http: don't always add Git-Protocol header > remote-curl: implement stateless-connect command > remote-curl: don't request v2 when pushing > > .gitignore | 1 + > Documentation/technical/protocol-v2.txt | 338 + > Makefile| 7 +- > builtin.h | 2 + > builtin/clone.c | 2 +- > builtin/fetch-pack.c| 21 +- > builtin/fetch.c | 14 +- > builtin/ls-remote.c | 7 +- > builtin/receive-pack.c | 6 + > builtin/remote.c| 2 +- > builtin/send-pack.c | 20 +- > builtin/serve.c | 30 ++ > builtin/upload-pack.c | 74 > connect.c | 352 +- > connect.h | 7 + > fetch-pack.c| 319 +++- > fetch-pack.h| 4 +- > git.c | 2 + > http.c | 25 +- > http.h | 2 + > ls-refs.c | 96 + > ls-refs.h | 9 + > pkt-line.c | 149 +++- > pkt-line.h | 77 > protocol.c
Re: [PATCH v4 01/13] commit-graph: add format document
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stoleewrote: > Add document specifying the binary format for commit graphs. This > format allows for: > > * New versions. > * New hash functions and hash lengths. > * Optional extensions. > > Basic header information is followed by a binary table of contents > into "chunks" that include: > > * An ordered list of commit object IDs. > * A 256-entry fanout into that list of OIDs. > * A list of metadata for the commits. > * A list of "large edges" to enable octopus merges. > > The format automatically includes two parent positions for every > commit. This favors speed over space, since using only one position > per commit would cause an extra level of indirection for every merge > commit. (Octopus merges suffer from this indirection, but they are > very rare.) > > Signed-off-by: Derrick Stolee > --- > Documentation/technical/commit-graph-format.txt | 90 > + > 1 file changed, 90 insertions(+) > create mode 100644 Documentation/technical/commit-graph-format.txt > > diff --git a/Documentation/technical/commit-graph-format.txt > b/Documentation/technical/commit-graph-format.txt > new file mode 100644 > index 000..11b18b5 > --- /dev/null > +++ b/Documentation/technical/commit-graph-format.txt > @@ -0,0 +1,90 @@ > +Git commit graph format > +=== > + > +The Git commit graph stores a list of commit OIDs and some associated > +metadata, including: > + > +- The generation number of the commit. Commits with no parents have > + generation number 1; commits with parents have generation number > + one more than the maximum generation number of its parents. We > + reserve zero as special, and can be used to mark a generation > + number invalid or as "not computed". > + > +- The root tree OID. > + > +- The commit date. > + > +- The parents of the commit, stored using positional references within > + the graph file. > + > +== graph-*.graph files have the following format: > + > +In order to allow extensions that add extra data to the graph, we organize > +the body into "chunks" and provide a binary lookup table at the beginning > +of the body. The header includes certain values, such as number of chunks, > +hash lengths and types. > + > +All 4-byte numbers are in network order. > + > +HEADER: > + > + 4-byte signature: > + The signature is: {'C', 'G', 'P', 'H'} > + > + 1-byte version number: > + Currently, the only valid version is 1. > + > + 1-byte Object Id Version (1 = SHA-1) > + > + 1-byte number (C) of "chunks" > + > + 1-byte (reserved for later use) What should clients of today do with it? * ignore it completely [as they have no idea what it is] or * throw hands up in the air if it is anything other than 0 ? [because clearly we will increment the version or have new information in a new chunk instead of just sneaking in information here?] > +CHUNK LOOKUP: > + > + (C + 1) * 12 bytes listing the table of contents for the chunks: > + First 4 bytes describe chunk id. Value 0 is a terminating label. > + Other 8 bytes provide offset in current file for chunk to start. offset [in bytes? I could imagine having a larger granularity here, because chunks don't sound small.] > + (Chunks are ordered contiguously in the file, so you can infer > + the length using the next chunk position if necessary.) > + > + The remaining data in the body is described one chunk at a time, and > + these chunks may be given in any order. Chunks are required unless > + otherwise specified. > + > +CHUNK DATA: > + > + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) > + The ith entry, F[i], stores the number of OIDs with first > + byte at most i. Thus F[255] stores the total > + number of commits (N). [ so in small repos, where there are fewer than 256 objects, F[i] == F[i+1], for all i'th where there is no object starting with i byte] > + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) > + The OIDs for all commits in the graph, sorted in ascending order. > + > + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) > +* The first H bytes are for the OID of the root tree. > +* The next 8 bytes are for the int-ids of the first two parents > + of the ith commit. Stores value 0x if no parent in that > + position. If there are more than two parents, the second value > + has its most-significant bit on and the other bits store an array > + position into the Large Edge List chunk. > +* The next 8 bytes store the generation number of the commit and > + the commit time in seconds since EPOCH. The generation number > + uses the higher 30 bits of the first 4 bytes, while the commit > + time uses the 32 bits of the second 4 bytes, along with the lowest > + 2 bits of the lowest byte, storing the 33rd and 34th bit of the > + commit time. > + > + Large Edge List (ID: {'E',
[PATCH] commit: drop uses of get_cached_commit_buffer()
The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. - Second, if the revision walk visits a commit that was marked UNINTERESTING, the walk may never load a buffer and hence rev-list will not output the verbose information. These behaviors are undocumented, untested, and unlikely to be expected by users or other software attempting to parse this output. Helped-by: Jeff KingSigned-off-by: Derrick Stolee --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9..d320b6f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d..22b2fb6 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.7.4
Re: [PATCH v4 03/13] commit-graph: create git-commit-graph builtin
Junio C Hamanowrites: > Derrick Stolee writes: > >> +int cmd_commit_graph(int argc, const char **argv, const char *prefix) >> +{ >> +static struct option builtin_commit_graph_options[] = { >> +{ OPTION_STRING, 'p', "object-dir", _dir, >> +N_("dir"), >> +N_("The object directory to store the graph") }, > > I have a suspicion that this was modeled after some other built-in > that has a similar issue (perhaps written long time ago), but isn't > OPT_STRING() sufficient to define this element these days? > > Or am I missing something? > > Why squat on short-and-sweet "-p"? For that matter, since this is > not expected to be end-user facing command anyway, I suspect that we > do not want to allocate a single letter option from day one, which > paints ourselves into a corner from where we cannot escape. I suspect that exactly the same comment applies to patches in this series that add other subcommands (I just saw one in the patch for adding 'write').