Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cy wrote: > On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas wrote: >> Committed. > > Thanks Robert, > > And also sorry, one unfortunate thing happened in the last patch while > fixing one of the review comments a variable disappeared from the > equation > @_hash

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Mithun Cy
On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas wrote: > Committed. Thanks Robert, And also sorry, one unfortunate thing happened in the last patch while fixing one of the review comments a variable disappeared from the equation @_hash_spareindex. splitpoint_phases += - (((num

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-03 Thread Robert Haas
On Sat, Apr 1, 2017 at 3:29 AM, Mithun Cy wrote: > On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy wrote: >> Also adding a patch which implements the 2nd way. > Sorry, I forgot to add sortbuild_hash patch, which also needs similar > changes for the hash_mask. Committed. -- Robert Haas EnterpriseDB:

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-01 Thread Mithun Cy
On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy wrote: > Also adding a patch which implements the 2nd way. Sorry, I forgot to add sortbuild_hash patch, which also needs similar changes for the hash_mask. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com sortbuild_hash_A_2

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-01 Thread Mithun Cy
On Sat, Apr 1, 2017 at 7:05 AM, Robert Haas wrote: > Hmm, don't the changes to contrib/pageinspect/expected/hash.out > indicate that you've broken something? The hash index has only 4 > buckets, so the new code shouldn't be doing anything differently, but > you've got highmask and lowmask changin

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 1:15 AM, Mithun Cy wrote: > Thanks, I have tried to fix all of the comments. Thanks. Hmm, don't the changes to contrib/pageinspect/expected/hash.out indicate that you've broken something? The hash index has only 4 buckets, so the new code shouldn't be doing anything diff

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy
Thanks, I have tried to fix all of the comments. On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas wrote: > On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy wrote: >> Thanks Robert, I have tried to fix the comments given as below. > > Thanks. > > Since this changes the on-disk format of hash indexes in an

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas
On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy wrote: > Thanks Robert, I have tried to fix the comments given as below. Thanks. Since this changes the on-disk format of hash indexes in an incompatible way, I think it should bump HASH_VERSION. (Hopefully that doesn't preclude REINDEX?) pg_upgrade s

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy
Thanks Robert, I have tried to fix the comments given as below. On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas wrote: > I think that the macros in hash.h need some more work: > > - Pretty much any time you use the argument of a macro, you need to > parenthesize it in the macro definition to avoid s

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas
On Wed, Mar 29, 2017 at 8:03 AM, Mithun Cy wrote: > Thanks, Amit for a detailed review. I think that the macros in hash.h need some more work: - Pretty much any time you use the argument of a macro, you need to parenthesize it in the macro definition to avoid surprises if the macros is called us

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy
On Thu, Mar 30, 2017 at 7:23 PM, Robert Haas wrote: > I think approach B is incorrect. Suppose we have 1536 buckets and > hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193. If I > understand correctly, each of these values should be mapped either to > bucket 0 or to bucket 1, and th

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas
On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy wrote: > B. In tuple sort we can use hash function bucket = hash_key % > num_buckets instead of existing one which does bitwise "and" to > determine the bucket of hash key. This way we will not wrongly assign > buckets beyond max_buckets and sorted hash k

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy
Thanks, Amit for a detailed review. On Wed, Mar 29, 2017 at 4:09 PM, Amit Kapila wrote: > Okay, your current patch looks good to me apart from minor comments, > so marked as Read For Committer. Please either merge the > sort_hash_b_2.patch with main patch or submit it along with next > revision

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy
That means at every new +split point we double the existing number of buckets. Allocating huge chucks On Mon, Mar 27, 2017 at 11:56 PM, Jesper Pedersen wrote: > I ran some performance scenarios on the patch to see if the increased > 'spares' allocation had an impact. I havn't found any regressi

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Amit Kapila
On Wed, Mar 29, 2017 at 12:51 PM, Mithun Cy wrote: > On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila wrote: >> Few other comments: >> +/* >> + * This is just a trick to save a division operation. If you look into the >> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will >> indi

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy
On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila wrote: >> I wonder if we should consider increasing >> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat. For example, split >> point 4 is responsible for allocating only 16 new buckets = 128kB; >> doing those in four groups of two (16kB) seems fairly p

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Amit Kapila
On Tue, Mar 28, 2017 at 8:56 PM, Robert Haas wrote: > On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy wrote: >>> This will go wrong for split point group zero. In general, I feel if >>> you handle computation for split groups lesser than >>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then a

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Robert Haas
On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy wrote: >> This will go wrong for split point group zero. In general, I feel if >> you handle computation for split groups lesser than >> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your >> macros will become much simpler and less error

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Mithun Cy
On Tue, Mar 28, 2017 at 12:21 PM, Amit Kapila wrote: > I think both way it can work. I feel there is no hard pressed need > that we should make the computation in sorting same as what you do > _hash_doinsert. In patch_B, I don't think new naming for variables is > good. > > Assert(!a->isnull1)

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Amit Kapila
On Tue, Mar 28, 2017 at 10:43 AM, Mithun Cy wrote: > On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > > > As you have said we can solve it if we allocate buckets always in > power-of-2 when we do hash index meta page init. But on other > occasions, when we try to double the existing buckets

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Mithun Cy
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > I think we can't change the number of buckets to be created or lowmask > and highmask calculation here without modifying _h_spoolinit() because > it sorts the data to be inserted based on hashkey which in turn > depends on the number of bucke

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Jesper Pedersen
Hi Mithun, On 03/26/2017 01:56 AM, Mithun Cy wrote: Thanks, Amit for the review. On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: I think one-dimensional patch has fewer places to touch, so that looks better to me. However, I think there is still hard coding and assumptions in code which

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Amit Kapila
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy > wrote: >> Thanks, Amit for the review. >> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: >>> >>> I think one-dimensional patch has fewer places to touch, so that looks >>> better to me. Ho

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-26 Thread Amit Kapila
On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy wrote: > Thanks, Amit for the review. > On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: >> >> I think one-dimensional patch has fewer places to touch, so that looks >> better to me. However, I think there is still hard coding and >> assumptions in

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Mithun Cy
Thanks, Amit for the review. On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: > > I think one-dimensional patch has fewer places to touch, so that looks > better to me. However, I think there is still hard coding and > assumptions in code which we should try to improve. Great!, I will continu

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 10:13 AM, Mithun Cy wrote: > On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: >> Sure, I was telling you based on that. If you are implicitly treating >> it as 2-dimensional array, it might be easier to compute the array >>offsets. > > I think calculating spares offset

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-24 Thread Mithun Cy
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > Sure, I was telling you based on that. If you are implicitly treating > it as 2-dimensional array, it might be easier to compute the array >offsets. I think calculating spares offset will not become anyway much simpler we still need to calcul

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy
On Fri, Mar 24, 2017 at 1:22 AM, Mithun Cy wrote: > Hi Amit please find the new patch The pageinspect.sgml has an example which shows the output of "hash_metapage_info()". Since we increase the spares array and eventually ovflpoint, I have updated the example with corresponding values.. -- Tha

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy
Hi Amit please find the new patch On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > It is not only about above calculation, but also what the patch is > doing in function _hash_get_tbuckets(). By the way function name also > seems unclear (mainly *tbuckets* in the name). Fixed I have introd

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy wrote: >> Hi Amit, Thanks for the review, >> >> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: >>> idea could be to make hashm_spares a two-dimensional array >>> hashm_spares[32][4] where the fi

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila
On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy wrote: > Hi Amit, Thanks for the review, > > On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: >> idea could be to make hashm_spares a two-dimensional array >> hashm_spares[32][4] where the first dimension will indicate the split >> point and second wi

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Mithun Cy
Hi Amit, Thanks for the review, On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: > idea could be to make hashm_spares a two-dimensional array > hashm_spares[32][4] where the first dimension will indicate the split > point and second will indicate the sub-split number. I am not sure > whether

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila
On Sat, Mar 18, 2017 at 10:59 PM, Mithun Cy wrote: > On Thu, Mar 16, 2017 at 10:55 PM, David Steele wrote: >> It does apply with fuzz on 2b32ac2, so it looks like c11453c and >> subsequent commits are the cause. They represent a fairly substantial >> change to hash indexes by introducing WAL log

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-18 Thread Mithun Cy
On Thu, Mar 16, 2017 at 10:55 PM, David Steele wrote: > It does apply with fuzz on 2b32ac2, so it looks like c11453c and > subsequent commits are the cause. They represent a fairly substantial > change to hash indexes by introducing WAL logging so I think you should > reevaluate your patches to b

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-16 Thread David Steele
On 2/21/17 4:58 AM, Mithun Cy wrote: > Thanks, Amit > > On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila wrote: >> How will high and lowmask calculations work in this new strategy? >> Till now they always work on doubling strategy and I don't see you >> have changed anything related to that code. Ch

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-21 Thread Mithun Cy
Thanks, Amit On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila wrote: > How will high and lowmask calculations work in this new strategy? > Till now they always work on doubling strategy and I don't see you > have changed anything related to that code. Check below places. It is important that the ma

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-20 Thread Amit Kapila
On Fri, Feb 17, 2017 at 7:21 PM, Mithun Cy wrote: > > To implement such an incremental addition of bucket blocks I have to > increase the size of array hashm_spares in meta page by four times. > Also, mapping methods which map a total number of buckets to a > split-point position of hashm_spares a

[HACKERS] [POC] A better way to expand hash indexes.

2017-02-17 Thread Mithun Cy
Hi all, As of now, we expand the hash index by doubling the number of bucket blocks. But unfortunately, those blocks will not be used immediately. So I thought if we can differ bucket block allocation by some factor, hash index size can grow much efficiently. I have written a POC patch which does