Re: [PATCH 4/6] introduce a commit metapack

2013-03-18 Thread Jeff King
On Sun, Mar 17, 2013 at 08:21:13PM +0700, Nguyen Thai Ngoc Duy wrote: > On Thu, Jan 31, 2013 at 6:06 PM, Duy Nguyen wrote: > > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: > >> Perhaps we could store abbrev sha-1 instead of full sha-1. Nice > >> space/time trade-off. > > > > Follow

Re: [PATCH 4/6] introduce a commit metapack

2013-03-17 Thread Duy Nguyen
On Thu, Jan 31, 2013 at 6:06 PM, Duy Nguyen wrote: > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: >> Perhaps we could store abbrev sha-1 instead of full sha-1. Nice >> space/time trade-off. > > Following the on-disk format experiment yesterday, I changed the > format to: > > - a li

Re: [PATCH 4/6] introduce a commit metapack

2013-02-02 Thread Junio C Hamano
Jeff King writes: > On Thu, Jan 31, 2013 at 09:03:26AM -0800, Shawn O. Pearce wrote: > ... >> If we are going to change the index to support extension sections and >> I have to modify JGit to grok this new format, it needs to be index v3 >> not index v2. If we are making index v3 we should just p

Re: [PATCH 4/6] introduce a commit metapack

2013-02-02 Thread Duy Nguyen
On Fri, Feb 1, 2013 at 5:15 PM, Jeff King wrote: > The short-sha1 is a clever idea. Looks like it saves us on the order of > 4MB for linux-2.6 (versus the full 20-byte sha1). Not as big as the > savings we get from dropping the other 3 sha1's to uint32_t, but still > not bad. We could save anothe

Re: [PATCH 4/6] introduce a commit metapack

2013-02-01 Thread Jeff King
On Thu, Jan 31, 2013 at 06:06:56PM +0700, Nguyen Thai Ngoc Duy wrote: > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: > > Perhaps we could store abbrev sha-1 instead of full sha-1. Nice > > space/time trade-off. > > Following the on-disk format experiment yesterday, I changed the >

Re: [PATCH 4/6] introduce a commit metapack

2013-02-01 Thread Jeff King
On Thu, Jan 31, 2013 at 06:06:56PM +0700, Nguyen Thai Ngoc Duy wrote: > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: > > Perhaps we could store abbrev sha-1 instead of full sha-1. Nice > > space/time trade-off. > > Following the on-disk format experiment yesterday, I changed the >

Re: [PATCH 4/6] introduce a commit metapack

2013-02-01 Thread Jeff King
On Wed, Jan 30, 2013 at 08:56:07PM +0700, Nguyen Thai Ngoc Duy wrote: > Another point, but not really important at this stage, I think we have > memory leak somewhere (lookup_commit??). It used up to 800 MB RES on > linux-2.6.git while generating the cache. We generate (and then leak!) the linked

Re: [PATCH 4/6] introduce a commit metapack

2013-02-01 Thread Jeff King
On Thu, Jan 31, 2013 at 09:03:26AM -0800, Shawn O. Pearce wrote: > > Of course, it is more convenient to store this kind of things in a > > separate file while experimenting and improving the mechanism, but I > > do not think we want to see each packfile in a repository comes with > > 47 auxiliary

Re: [PATCH 4/6] introduce a commit metapack

2013-02-01 Thread Jeff King
On Tue, Jan 29, 2013 at 11:17:41PM -0800, Junio C Hamano wrote: > > True, but it is even less headache if the file is totally separate and > > optional. > > Once you start thinking about using an offset to some list of SHA-1, > perhaps? A section inside the same file can never go out of sync. Y

Re: [PATCH 4/6] introduce a commit metapack

2013-01-31 Thread Shawn Pearce
On Wed, Jan 30, 2013 at 7:56 AM, Junio C Hamano wrote: > Jeff King writes: > >>>From this: >> >>> Then it will be very natural for the extension data that store the >>> commit metainfo to name objects in the pack the .idx file describes >>> by the offset in the SHA-1 table. >> >> I guess your arg

Re: [PATCH 4/6] introduce a commit metapack

2013-01-31 Thread Duy Nguyen
On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: > Perhaps we could store abbrev sha-1 instead of full sha-1. Nice > space/time trade-off. Following the on-disk format experiment yesterday, I changed the format to: - a list a _short_ SHA-1 of cached commits - a list of cache entries,

Re: [PATCH 4/6] introduce a commit metapack

2013-01-30 Thread Junio C Hamano
Jeff King writes: >>From this: > >> Then it will be very natural for the extension data that store the >> commit metainfo to name objects in the pack the .idx file describes >> by the offset in the SHA-1 table. > > I guess your argument is that putting it all in the same file makes it > more natu

Re: [PATCH 4/6] introduce a commit metapack

2013-01-30 Thread Duy Nguyen
On Wed, Jan 30, 2013 at 8:56 PM, Duy Nguyen wrote: > However, performance seems to suffer too. Maybe I do more lookups than > necessary, I don't know. Yes, I should have stored the position in the sha-1 <-> offset map instead of the position of the object in .pack file. Even so, performance does

Re: [PATCH 4/6] introduce a commit metapack

2013-01-30 Thread Duy Nguyen
On Tue, Jan 29, 2013 at 04:16:11AM -0500, Jeff King wrote: > When we are doing a commit traversal that does not need to > look at the commit messages themselves (e.g., rev-list, > merge-base, etc), we spend a lot of time accessing, > decompressing, and parsing the commit objects just to find > the

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Junio C Hamano
> True, but it is even less headache if the file is totally separate and > optional. Once you start thinking about using an offset to some list of SHA-1, perhaps? A section inside the same file can never go out of sync. Also a longer-term advantage is that you can teach index-pack to do this. --

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Jeff King
On Wed, Jan 30, 2013 at 10:36:10AM +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Jan 29, 2013 at 4:16 PM, Jeff King wrote: > > +int commit_metapack(unsigned char *sha1, > > + uint32_t *timestamp, > > + unsigned char **tree, > > + unsigned char

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Jeff King
On Tue, Jan 29, 2013 at 10:08:08AM -0800, Junio C Hamano wrote: > > In order to reduce the disk footprint and I/O cost, the future > > direction for this mechanism may want to point into an existing > > store of SHA-1 hashes with a shorter file offset, and the .idx file > > could be such a store,

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Jeff King
On Tue, Jan 29, 2013 at 09:38:10AM -0800, Junio C Hamano wrote: > Jeff King writes: > > > +int commit_metapack(unsigned char *sha1, > > + uint32_t *timestamp, > > + unsigned char **tree, > > + unsigned char **parent1, > > + unsigned char **

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Duy Nguyen
On Tue, Jan 29, 2013 at 4:16 PM, Jeff King wrote: > +int commit_metapack(unsigned char *sha1, > + uint32_t *timestamp, > + unsigned char **tree, > + unsigned char **parent1, > + unsigned char **parent2) > +{ Nit picking. tree

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Junio C Hamano
Junio C Hamano writes: > I am torn on this one. > > These cached properties of a single commit will not change no matter > which pack it appears in, and it feels logically wrong, especially > when you record these object names in the full SHA-1 form, to tie a > "commit metapack" to a pack. Logic

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Junio C Hamano
Jeff King writes: > +int commit_metapack(unsigned char *sha1, > + uint32_t *timestamp, > + unsigned char **tree, > + unsigned char **parent1, > + unsigned char **parent2) > +{ > + struct commit_metapack *p; > + > + prepare_co

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Jeff King
On Tue, Jan 29, 2013 at 11:24:45AM +0100, Michael Haggerty wrote: > On 01/29/2013 10:16 AM, Jeff King wrote: > > When we are doing a commit traversal that does not need to > > look at the commit messages themselves (e.g., rev-list, > > merge-base, etc), we spend a lot of time accessing, > > decomp

Re: [PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Michael Haggerty
On 01/29/2013 10:16 AM, Jeff King wrote: > When we are doing a commit traversal that does not need to > look at the commit messages themselves (e.g., rev-list, > merge-base, etc), we spend a lot of time accessing, > decompressing, and parsing the commit objects just to find > the parent and timesta

[PATCH 4/6] introduce a commit metapack

2013-01-29 Thread Jeff King
When we are doing a commit traversal that does not need to look at the commit messages themselves (e.g., rev-list, merge-base, etc), we spend a lot of time accessing, decompressing, and parsing the commit objects just to find the parent and timestamp information. We can make a space-time tradeoff b