Bucket seems like a rather good name for it. The method for generating could be Hash, running sequence modded, etc. So HashBucket, RoundRobinBucket, etc.
On Tuesday, October 22, 2013, James Taylor wrote: > One thing I neglected to mention is that the table is pre-split at the > "prepending-row-key-with-single-hashed-byte" boundaries, so the expectation > is that you'd allocate enough buckets that you don't end up needing to > splitting the regions. But if you under allocate (i.e. allocate too small a > SALT_BUCKETS value), then I see your point. > > Thanks, > James > > > On Mon, Oct 21, 2013 at 5:58 PM, Michael Segel > <[email protected]<javascript:;> > >wrote: > > > James, > > > > Its evenly distributed, however... because its a time stamp, its a 'tail > > end charlie' addition. > > So when you split a region, the top half is never added to, so you end up > > with all regions half filled except for the last region in each 'modded' > > value. > > > > I wouldn't say its a bad thing if you plan for it. > > > > On Oct 21, 2013, at 5:07 PM, James Taylor <[email protected]> > wrote: > > > > > We don't truncate the hash, we mod it. Why would you expect that data > > > wouldn't be evenly distributed? We've not seen this to be the case. > > > > > > > > > > > > On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel < > > [email protected]>wrote: > > > > > >> What do you call hashing the row key? > > >> Or hashing the row key and then appending the row key to the hash? > > >> Or hashing the row key, truncating the hash value to some subset and > > then > > >> appending the row key to the value? > > >> > > >> The problem is that there is specific meaning to the term salt. > Re-using > > >> it here will cause confusion because you're implying something you > don't > > >> mean to imply. > > >> > > >> you could say prepend a truncated hash of the key, however… is > prepend a > > >> real word? ;-) (I am sorry, I am not a grammar nazi, nor an English > > major. ) > > >> > > >> So even outside of Phoenix, the concept is the same. > > >> Even with a truncated hash, you will find that over time, all but the > > tail > > >> N regions will only be half full. > > >> This could be both good and bad. > > >> > > >> (Where N is your number 8 or 16 allowable hash values.) > > >> > > >> You've solved potentially one problem… but still have other issues > that > > >> you need to address. > > >> I guess the simple answer is to double the region sizes and not care > > that > > >> most of your regions will be 1/2 the max size… but the size you > really > > >> want and 8-16 regions will be up to twice as big. > > >> > > >> > > >> > > >> On Oct 21, 2013, at 3:26 PM, James Taylor <[email protected]> > > wrote: > > >> > > >>> What do you think it should be called, because > > >>> "prepending-row-key-with-single-hashed-byte" doesn't have a very good > > >> ring > > >>> to it. :-) > > >>> > > >>> Agree that getting the row key design right is crucial. > > >>> > > >>> The range of "prepending-row-key-with-single-hashed-byte" is > > declarative > > >>> when you create your table in Phoenix, so you typically declare an > > upper > > >>> bound based on your cluster size (not 255, but maybe 8 or 16). We've > > run > > >>> the numbers and it's typically faster, but as with most things, not > > >> always. > > >>> > > >>> HTH, > > >>> James > > >>> > > >>> > > >>> On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel < > > >> [email protected]>wrote: > > >>> > > >>>> Then its not a SALT. And please don't use the term 'salt' because it > > has > > >>>> specific meaning outside to what you want it to mean. Just like > > saying > > >>>> HBase has ACID because you write the entire row as an atomic > element. > > >> But > > >>>> I digress…. > > >>>> > > >>>> Ok so to your point… > > >>>> > > >>>> 1 byte == 255 possible values. > > >>>> > > >>>> So which will be faster. > > >>>> > > >>>> creating a list of the 1 byte truncated hash of each possible > > timestamp > > >> in > > >>>> your range, or doing 255 separate range scans with the start and > stop > > >> range > > >>>> key set? > > >>>> > > >>>> Th
